1. 19 Jun, 2017 1 commit
    • Hugh Dickins's avatar
      mm: larger stack guard gap, between vmas · 1be7107f
      Hugh Dickins authored
      Stack guard page is a useful feature to reduce a risk of stack smashing
      into a different mapping. We have been using a single page gap which
      is sufficient to prevent having stack adjacent to a different mapping.
      But this seems to be insufficient in the light of the stack usage in
      userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
      used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
      which is 256kB or stack strings with MAX_ARG_STRLEN.
      This will become especially dangerous for suid binaries and the default
      no limit for the stack size limit because those applications can be
      tricked to consume a large portion of the stack and a single glibc call
      could jump over the guard page. These attacks are not theoretical,
      Make those attacks less probable by increasing the stack guard gap
      to 1MB (on systems with 4k pages; but make it depend on the page size
      because systems with larger base pages might cap stack allocations in
      the PAGE_SIZE units) which should cover larger alloca() and VLA stack
      allocations. It is obviously not a full fix because the problem is
      somehow inherent, but it should reduce attack space a lot.
      One could argue that the gap size should be configurable from userspace,
      but that can be done later when somebody finds that the new 1MB is wrong
      for some special case applications.  For now, add a kernel command line
      option (stack_guard_gap) to specify the stack gap size (in page units).
      Implementation wise, first delete all the old code for stack guard page:
      because although we could get away with accounting one extra page in a
      stack vma, accounting a larger gap can break userspace - case in point,
      a program run with "ulimit -S -v 20000" failed when the 1MB gap was
      counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
      and strict non-overcommit mode.
      Instead of keeping gap inside the stack vma, maintain the stack guard
      gap as a gap between vmas: using vm_start_gap() in place of vm_start
      (or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
      places which need to respect the gap - mainly arch_get_unmapped_area(),
      and and the vma tree's subtree_gap support for that.
      Original-patch-by: default avatarOleg Nesterov <oleg@redhat.com>
      Original-patch-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Tested-by: Helge Deller <deller@gmx.de> # parisc
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  2. 01 Jun, 2017 1 commit
  3. 09 May, 2017 1 commit
  4. 01 May, 2017 1 commit
  5. 26 Apr, 2017 2 commits
    • Paul E. McKenney's avatar
      srcu: Specify auto-expedite holdoff time · 22607d66
      Paul E. McKenney authored
      On small systems, in the absence of readers, expedited SRCU grace
      periods can complete in less than a microsecond.  This means that an
      eight-CPU system can have all CPUs doing synchronize_srcu() in a tight
      loop and almost always expedite.  This might actually be desirable in
      some situations, but in general it is a good way to needlessly burn
      CPU cycles.  And in those situations where it is desirable, your friend
      is the function synchronize_srcu_expedited().
      For other situations, this commit adds a kernel parameter that specifies
      a holdoff between completing the last SRCU grace period and auto-expediting
      the next.  If the next grace period starts before the holdoff expires,
      auto-expediting is disabled.  The holdoff is 50 microseconds by default,
      and can be tuned to the desired number of nanoseconds.  A value of zero
      disables auto-expediting.
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarMike Galbraith <efault@gmx.de>
    • Shaohua Li's avatar
      x86, iommu/vt-d: Add an option to disable Intel IOMMU force on · bfd20f1c
      Shaohua Li authored
      IOMMU harms performance signficantly when we run very fast networking
      workloads. It's 40GB networking doing XDP test. Software overhead is
      almost unaware, but it's the IOTLB miss (based on our analysis) which
      kills the performance. We observed the same performance issue even with
      software passthrough (identity mapping), only the hardware passthrough
      survives. The pps with iommu (with software passthrough) is only about
      ~30% of that without it. This is a limitation in hardware based on our
      observation, so we'd like to disable the IOMMU force on, but we do want
      to use TBOOT and we can sacrifice the DMA security bought by IOMMU. I
      must admit I know nothing about TBOOT, but TBOOT guys (cc-ed) think not
      eabling IOMMU is totally ok.
      So introduce a new boot option to disable the force on. It's kind of
      silly we need to run into intel_iommu_init even without force on, but we
      need to disable TBOOT PMR registers. For system without the boot option,
      nothing is changed.
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
  6. 20 Apr, 2017 1 commit
  7. 06 Apr, 2017 1 commit
    • Will Deacon's avatar
      iommu: Allow default domain type to be set on the kernel command line · fccb4e3b
      Will Deacon authored
      The IOMMU core currently initialises the default domain for each group
      to IOMMU_DOMAIN_DMA, under the assumption that devices will use
      IOMMU-backed DMA ops by default. However, in some cases it is desirable
      for the DMA ops to bypass the IOMMU for performance reasons, reserving
      use of translation for subsystems such as VFIO that require it for
      enforcing device isolation.
      Rather than modify each IOMMU driver to provide different semantics for
      DMA domains, instead we introduce a command line parameter that can be
      used to change the type of the default domain. Passthrough can then be
      specified using "iommu.passthrough=1" on the kernel command line.
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
  8. 01 Apr, 2017 1 commit
  9. 28 Mar, 2017 1 commit
    • Borislav Petkov's avatar
      RAS: Add a Corrected Errors Collector · 011d8261
      Borislav Petkov authored
      Introduce a simple data structure for collecting correctable errors
      along with accessors. More detailed description in the code itself.
      The error decoding is done with the decoding chain now and
      mce_first_notifier() gets to see the error first and the CEC decides
      whether to log it and then the rest of the chain doesn't hear about it -
      basically the main reason for the CE collector - or to continue running
      the notifiers.
      When the CEC hits the action threshold, it will try to soft-offine the
      page containing the ECC and then the whole decoding chain gets to see
      the error.
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170327093304.10683-5-bp@alien8.de
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  10. 21 Mar, 2017 1 commit
  11. 06 Mar, 2017 1 commit
  12. 05 Mar, 2017 1 commit
    • Len Brown's avatar
      cpufreq: Add the "cpufreq.off=1" cmdline option · d82f2692
      Len Brown authored
      Add the "cpufreq.off=1" cmdline option.
      At boot-time, this allows a user to request CONFIG_CPU_FREQ=n
      behavior from a kernel built with CONFIG_CPU_FREQ=y.
      This is analogous to the existing "cpuidle.off=1" option
      and CONFIG_CPU_IDLE=y
      This capability is valuable when we need to debug end-user
      issues in the BIOS or in Linux.  It is also convenient
      for enabling comparisons, which may otherwise require a new kernel,
      or help from BIOS SETUP, which may be buggy or unavailable.
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
  13. 03 Mar, 2017 1 commit
  14. 23 Feb, 2017 1 commit
    • Tejun Heo's avatar
      slub: make sysfs directories for memcg sub-caches optional · 1663f26d
      Tejun Heo authored
      SLUB creates a per-cache directory under /sys/kernel/slab which hosts a
      bunch of debug files.  Usually, there aren't that many caches on a
      system and this doesn't really matter; however, if memcg is in use, each
      cache can have per-cgroup sub-caches.  SLUB creates the same directories
      for these sub-caches under /sys/kernel/slab/$CACHE/cgroup.
      Unfortunately, because there can be a lot of cgroups, active or
      draining, the product of the numbers of caches, cgroups and files in
      each directory can reach a very high number - hundreds of thousands is
      commonplace.  Millions and beyond aren't difficult to reach either.
      What's under /sys/kernel/slab is primarily for debugging and the
      information and control on the a root cache already cover its
      sub-caches.  While having a separate directory for each sub-cache can be
      helpful for development, it doesn't make much sense to pay this amount
      of overhead by default.
      This patch introduces a boot parameter slub_memcg_sysfs which determines
      whether to create sysfs directories for per-memcg sub-caches.  It also
      adds CONFIG_SLUB_MEMCG_SYSFS_ON which determines the boot parameter's
      default value and defaults to 0.
      [akpm@linux-foundation.org: kset_unregister(NULL) is legal]
      Link: http://lkml.kernel.org/r/20170204145203.GB26958@mtj.duckdns.org
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  15. 15 Feb, 2017 1 commit
    • Thomas Gleixner's avatar
      x86/platform/goldfish: Prevent unconditional loading · 47512cfd
      Thomas Gleixner authored
      The goldfish platform code registers the platform device unconditionally
      which causes havoc in several ways if the goldfish_pdev_bus driver is
       - Access to the hardcoded physical memory region, which is either not
         available or contains stuff which is completely unrelated.
       - Prevents that the interrupt of the serial port can be requested
       - In case of a spurious interrupt it goes into a infinite loop in the
         interrupt handler of the pdev_bus driver (which needs to be fixed
      Add a 'goldfish' command line option to make the registration opt-in when
      the platform is compiled in.
      I'm seriously grumpy about this engineering trainwreck, which has seven
      SOBs from Intel developers for 50 lines of code. And none of them figured
      out that this is broken. Impressive fail!
      Fixes: ddd70cf9
       ("goldfish: platform device for x86")
      Reported-by: default avatarGabriel C <nix.or.die@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
  16. 07 Feb, 2017 1 commit
  17. 04 Feb, 2017 1 commit
  18. 16 Jan, 2017 1 commit
  19. 15 Jan, 2017 1 commit
  20. 12 Jan, 2017 1 commit
  21. 26 Dec, 2016 1 commit
  22. 20 Dec, 2016 1 commit
    • Mimi Zohar's avatar
      ima: define a canonical binary_runtime_measurements list format · d68a6fe9
      Mimi Zohar authored
      The IMA binary_runtime_measurements list is currently in platform native
      To allow restoring a measurement list carried across kexec with a
      different endianness than the targeted kernel, this patch defines
      little-endian as the canonical format.  For big endian systems wanting
      to save/restore the measurement list from a system with a different
      endianness, a new boot command line parameter named "ima_canonical_fmt"
      is defined.
      Considerations: use of the "ima_canonical_fmt" boot command line option
      will break existing userspace applications on big endian systems
      expecting the binary_runtime_measurements list to be in platform native
      Link: http://lkml.kernel.org/r/1480554346-29071-10-git-send-email-zohar@linux.vnet.ibm.com
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: default avatarDmitry Kasatkin <dmitry.kasatkin@gmail.com>
      Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andreas Steffen <andreas.steffen@strongswan.org>
      Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  23. 19 Dec, 2016 1 commit
    • Geert Uytterhoeven's avatar
      swiotlb: Add swiotlb=noforce debug option · fff5d992
      Geert Uytterhoeven authored
      On architectures like arm64, swiotlb is tied intimately to the core
      architecture DMA support. In addition, ZONE_DMA cannot be disabled.
      To aid debugging and catch devices not supporting DMA to memory outside
      the 32-bit address space, add a kernel command line option
      "swiotlb=noforce", which disables the use of bounce buffers.
      If specified, trying to map memory that cannot be used with DMA will
      fail, and a rate-limited warning will be printed.
      Note that io_tlb_nslabs is set to 1, which is the minimal supported
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
  24. 01 Dec, 2016 1 commit
  25. 03 Nov, 2016 1 commit
  26. 27 Oct, 2016 1 commit
    • Jonathan Corbet's avatar
      docs: Clean up and organize the admin guide a bit · 7358bb2f
      Jonathan Corbet authored
      The admin guide is a good start, but it's time to turn it into something
      better than an unordered blob of files.  This is a first step in that
      direction.  The TOC has been split up and annotated, the guides have been
      reordered, and minor tweaks have been applied to a few of them.
      One consequence of splitting up the TOC is that we don't really want to use
      :numbered: anymore, since the count resets every time and there doesn't
      seem to be a way to change that.  Eventually we probably want to group the
      documents into sub-books, at which point we can go back to a single TOC,
      but it's probably early to do that.
      Reviewed-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarJonathan Corbet <corbet@lwn.net>
  27. 24 Oct, 2016 3 commits
  28. 18 Oct, 2016 1 commit
  29. 11 Oct, 2016 2 commits
    • Marcos Paulo de Souza's avatar
      Input: i8042 - skip selftest on ASUS laptops · 930e1924
      Marcos Paulo de Souza authored
      On suspend/resume cycle, selftest is executed to reset i8042 controller.
      But when this is done in Asus devices, subsequent calls to detect/init
      functions to elantech driver fails. Skipping selftest fixes this problem.
      An easier step to reproduce this problem is adding i8042.reset=1 as a
      kernel parameter. On Asus laptops, it'll make the system to start with the
      touchpad already stuck, since psmouse_probe forcibly calls the selftest
      This patch was inspired by John Hiesey's change[1], but, since this problem
      affects a lot of models of Asus, let's avoid running selftests on them.
      All models affected by this problem:
      [1]: https://marc.info/?l=linux-input&m=144312209020616&w=2
      Fixes: "ETPS/2 Elantech Touchpad dies after resume from suspend"
      Signed-off-by: default avatarMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
    • Noam Camus's avatar
      lib/bitmap.c: enhance bitmap syntax · 2d13e6ca
      Noam Camus authored
      Today there are platforms with many CPUs (up to 4K).  Trying to boot only
      part of the CPUs may result in too long string.
      For example lets take NPS platform that is part of arch/arc.  This
      platform have SMP system with 256 cores each with 16 HW threads (SMT
      machine) where HW thread appears as CPU to the kernel.  In this example
      there is total of 4K CPUs.  When one tries to boot only part of the HW
      threads from each core the string representing the map may be long...  For
      example if for sake of performance we decided to boot only first half of
      HW threads of each core the map will look like:
      This patch introduce new syntax to accommodate with such use case.  I
      added an optional postfix to a range of CPUs which will choose according
      to given modulo the desired range of reminders i.e.:
          <cpus range>:sed_size/group_size
      For example, above map can be described in new syntax like this:
      Note that this patch is backward compatible with current syntax.
      [akpm@linux-foundation.org: rework documentation]
      Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com
      Signed-off-by: default avatarNoam Camus <noamca@mellanox.com>
      Cc: David Decotigny <decot@googlers.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  30. 27 Sep, 2016 1 commit
  31. 26 Sep, 2016 1 commit
  32. 23 Sep, 2016 1 commit
    • Scott Wood's avatar
      arm64: arch_timer: Work around QorIQ Erratum A-008585 · f6dc1576
      Scott Wood authored
      Erratum A-008585 says that the ARM generic timer counter "has the
      potential to contain an erroneous value for a small number of core
      clock cycles every time the timer value changes".  Accesses to TVAL
      (both read and write) are also affected due to the implicit counter
      read.  Accesses to CVAL are not affected.
      The workaround is to reread TVAL and count registers until successive
      reads return the same value.  Writes to TVAL are replaced with an
      equivalent write to CVAL.
      The workaround is to reread TVAL and count registers until successive reads
      return the same value, and when writing TVAL to retry until counter
      reads before and after the write return the same value.
      The workaround is enabled if the fsl,erratum-a008585 property is found in
      the timer node in the device tree.  This can be overridden with the
      clocksource.arm_arch_timer.fsl-a008585 boot parameter, which allows KVM
      users to enable the workaround until a mechanism is implemented to
      automatically communicate this information.
      This erratum can be found on LS1043A and LS2080A.
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarScott Wood <oss@buserror.net>
      [will: renamed read macro to reflect that it's not usually unstable]
      Signed-off-by: default avatarWill Deacon <will.deacon@arm.com>
  33. 19 Sep, 2016 1 commit
  34. 13 Sep, 2016 1 commit
  35. 09 Sep, 2016 1 commit
    • Dave Hansen's avatar
      x86/pkeys: Default to a restrictive init PKRU · acd547b2
      Dave Hansen authored
      PKRU is the register that lets you disallow writes or all access to a given
      protection key.
      The XSAVE hardware defines an "init state" of 0 for PKRU: its most
      permissive state, allowing access/writes to everything.  Since we start off
      all new processes with the init state, we start all processes off with the
      most permissive possible PKRU.
      This is unfortunate.  If a thread is clone()'d [1] before a program has
      time to set PKRU to a restrictive value, that thread will be able to write
      to all data, no matter what pkey is set on it.  This weakens any integrity
      guarantees that we want pkeys to provide.
      To fix this, we define a very restrictive PKRU to override the
      XSAVE-provided value when we create a new FPU context.  We choose a value
      that only allows access to pkey 0, which is as restrictive as we can
      practically make it.
      This does not cause any practical problems with applications using
      protection keys because we require them to specify initial permissions for
      each key when it is allocated, which override the restrictive default.
      In the end, this ensures that threads which do not know how to manage their
      own pkey rights can not do damage to data which is pkey-protected.
      I would have thought this was a pretty contrived scenario, except that I
      heard a bug report from an MPX user who was creating threads in some very
      early code before main().  It may be crazy, but folks evidently _do_ it.
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: mgorman@techsingularity.net
      Cc: arnd@arndb.de
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
  36. 06 Sep, 2016 1 commit