1. 20 Nov, 2017 7 commits
  2. 19 Nov, 2017 1 commit
  3. 18 Nov, 2017 5 commits
  4. 16 Nov, 2017 11 commits
    • Rafael J. Wysocki's avatar
      PM / runtime: Drop children check from __pm_runtime_set_status() · f8817f61
      Rafael J. Wysocki authored
      The check for "active" children in __pm_runtime_set_status(), when
      trying to set the parent device status to "suspended", doesn't
      really make sense, because in fact it is not invalid to set the
      status of a device with runtime PM disabled to "suspended" in any
      case.  It is invalid to enable runtime PM for a device with its
      status set to "suspended" while its child_count reference counter
      is nonzero, but the check in __pm_runtime_set_status() doesn't
      really cover that situation.
      
      For this reason, drop the children check from __pm_runtime_set_status()
      and add a check against child_count reference counters of "suspended"
      devices to pm_runtime_enable().
      
      Fixes: a8636c89
      
       (PM / Runtime: Don't allow to suspend a device with an active child)
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: default avatarUlf Hansson <ulf.hansson@linaro.org>
      Reviewed-by: default avatarJohan Hovold <johan@kernel.org>
      f8817f61
    • Johan Hovold's avatar
      dt-bindings: usb: document hub and host-controller properties · f877918c
      Johan Hovold authored
      
      
      Hub nodes and host-controller nodes with child nodes must specify values
      for #address-cells (1) and #size-cells (0).
      
      Also make the definition of the related reg property a bit more
      stringent, and add comments to the example source.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      f877918c
    • Johan Hovold's avatar
      dt-bindings: usb: clean up compatible property · bfebcf54
      Johan Hovold authored
      
      
      Add quotation marks around the compatible string to avoid ambiguity due
      to following punctuation, and define the VID and PID components.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      bfebcf54
    • Johan Hovold's avatar
      dt-bindings: usb: fix reg-property port-number range · f42ae7b0
      Johan Hovold authored
      
      
      The USB hub port-number range for USB 2.0 is 1-255 and not 1-31 which
      reflects an arbitrary limit set by the current Linux implementation.
      
      Note that for USB 3.1 hubs the valid range is 1-15.
      
      Increase the documented valid range in the binding to 255, which is the
      maximum allowed by the specifications.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      f42ae7b0
    • Johan Hovold's avatar
      dt-bindings: usb: fix example hub node name · 1759f270
      Johan Hovold authored
      
      
      According to the OF Recommended Practice for USB, hub nodes shall be
      named "hub", but the example had mixed up the label and node names. Fix
      the node name and drop the redundant label.
      
      While at it, remove a newline and add a missing semicolon to the example
      source.
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      1759f270
    • Kemi Wang's avatar
      mm, sysctl: make NUMA stats configurable · 4518085e
      Kemi Wang authored
      This is the second step which introduces a tunable interface that allow
      numa stats configurable for optimizing zone_statistics(), as suggested
      by Dave Hansen and Ying Huang.
      
      =========================================================================
      
      When page allocation performance becomes a bottleneck and you can
      tolerate some possible tool breakage and decreased numa counter
      precision, you can do:
      
      	echo 0 > /proc/sys/vm/numa_stat
      
      In this case, numa counter update is ignored.  We can see about
      *4.8%*(185->176) drop of cpu cycles per single page allocation and
      reclaim on Jesper's page_bench01 (single thread) and *8.1%*(343->315)
      drop of cpu cycles per single page allocation and reclaim on Jesper's
      page_bench03 (88 threads) running on a 2-Socket Broadwell-based server
      (88 threads, 126G memory).
      
      Benchmark link provided by Jesper D Brouer (increase loop times to
      10000000):
      
        https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench
      
      =========================================================================
      
      When page allocation performance is not a bottleneck and you want all
      tooling to work, you can do:
      
      	echo 1 > /proc/sys/vm/numa_stat
      
      This is system default setting.
      
      Many thanks to Michal Hocko, Dave Hansen, Ying Huang and Vlastimil Babka
      for comments to help improve the original patch.
      
      [keescook@chromium.org: make sure mutex is a global static]
        Link: http://lkml.kernel.org/r/20171107213809.GA4314@beast
      Link: http://lkml.kernel.org/r/1508290927-8518-1-git-send-email-kemi.wang@intel.com
      
      Signed-off-by: default avatarKemi Wang <kemi.wang@intel.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reported-by: default avatarJesper Dangaard Brouer <brouer@redhat.com>
      Suggested-by: default avatarDave Hansen <dave.hansen@intel.com>
      Suggested-by: default avatarYing Huang <ying.huang@intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: "Luis R . Rodriguez" <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4518085e
    • Levin, Alexander (Sasha Levin)'s avatar
      kmemcheck: rip it out · 4675ff05
      Levin, Alexander (Sasha Levin) authored
      Fix up makefiles, remove references, and git rm kmemcheck.
      
      Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
      
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vegard Nossum <vegardno@ifi.uio.no>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Tim Hansen <devtimhansen@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4675ff05
    • Kirill A. Shutemov's avatar
      mm: consolidate page table accounting · af5b0f6a
      Kirill A. Shutemov authored
      Currently, we account page tables separately for each page table level,
      but that's redundant -- we only make use of total memory allocated to
      page tables for oom_badness calculation.  We also provide the
      information to userspace, but it has dubious value there too.
      
      This patch switches page table accounting to single counter.
      
      mm->pgtables_bytes is now used to account all page table levels.  We use
      bytes, because page table size for different levels of page table tree
      may be different.
      
      The change has user-visible effect: we don't have VmPMD and VmPUD
      reported in /proc/[pid]/status.  Not sure if anybody uses them.  (As
      alternative, we can always report 0 kB for them.)
      
      OOM-killer report is also slightly changed: we now report pgtables_bytes
      instead of nr_ptes, nr_pmd, nr_puds.
      
      Apart from reducing number of counters per-mm, the benefit is that we
      now calculate oom_badness() more correctly for machines which have
      different size of page tables depending on level or where page tables
      are less than a page in size.
      
      The only downside can be debuggability because we do not know which page
      table level could leak.  But I do not remember many bugs that would be
      caught by separate counters so I wouldn't lose sleep over this.
      
      [akpm@linux-foundation.org: fix mm/huge_memory.c]
      Link: http://lkml.kernel.org/r/20171006100651.44742-2-kirill.shutemov@linux.intel.com
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      [kirill.shutemov@linux.intel.com: fix build]
        Link: http://lkml.kernel.org/r/20171016150113.ikfxy3e7zzfvsr4w@black.fi.intel.com
      
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af5b0f6a
    • Kirill A. Shutemov's avatar
      mm: account pud page tables · b4e98d9a
      Kirill A. Shutemov authored
      On a machine with 5-level paging support a process can allocate
      significant amount of memory and stay unnoticed by oom-killer and memory
      cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
      account PUD page tables, only PMD and PTE.
      
      We already addressed the same issue for PMD page tables, see commit
      dc6c9a35 ("mm: account pmd page tables to the process").
      Introduction of 5-level paging brings the same issue for PUD page
      tables.
      
      The patch expands accounting to PUD level.
      
      [kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
        Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
      [heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
        Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
      Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b4e98d9a
    • Jérôme Glisse's avatar
      mm/mmu_notifier: avoid double notification when it is useless · 0f10851e
      Jérôme Glisse authored
      This patch only affects users of mmu_notifier->invalidate_range callback
      which are device drivers related to ATS/PASID, CAPI, IOMMUv2, SVM ...
      and it is an optimization for those users.  Everyone else is unaffected
      by it.
      
      When clearing a pte/pmd we are given a choice to notify the event under
      the page table lock (notify version of *_clear_flush helpers do call the
      mmu_notifier_invalidate_range).  But that notification is not necessary
      in all cases.
      
      This patch removes almost all cases where it is useless to have a call
      to mmu_notifier_invalidate_range before
      mmu_notifier_invalidate_range_end.  It also adds documentation in all
      those cases explaining why.
      
      Below is a more in depth analysis of why this is fine to do this:
      
      For secondary TLB (non CPU TLB) like IOMMU TLB or device TLB (when
      device use thing like ATS/PASID to get the IOMMU to walk the CPU page
      table to access a process virtual address space).  There is only 2 cases
      when you need to notify those secondary TLB while holding page table
      lock when clearing a pte/pmd:
      
        A) page backing address is free before mmu_notifier_invalidate_range_end
        B) a page table entry is updated to point to a new page (COW, write fault
           on zero page, __replace_page(), ...)
      
      Case A is obvious you do not want to take the risk for the device to write
      to a page that might now be used by something completely different.
      
      Case B is more subtle. For correctness it requires the following sequence
      to happen:
        - take page table lock
        - clear page table entry and notify (pmd/pte_huge_clear_flush_notify())
        - set page table entry to point to new page
      
      If clearing the page table entry is not followed by a notify before setting
      the new pte/pmd value then you can break memory model like C11 or C++11 for
      the device.
      
      Consider the following scenario (device use a feature similar to ATS/
      PASID):
      
      Two address addrA and addrB such that |addrA - addrB| >= PAGE_SIZE we
      assume they are write protected for COW (other case of B apply too).
      
      [Time N] -----------------------------------------------------------------
      CPU-thread-0  {try to write to addrA}
      CPU-thread-1  {try to write to addrB}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {read addrA and populate device TLB}
      DEV-thread-2  {read addrB and populate device TLB}
      [Time N+1] ---------------------------------------------------------------
      CPU-thread-0  {COW_step0: {mmu_notifier_invalidate_range_start(addrA)}}
      CPU-thread-1  {COW_step0: {mmu_notifier_invalidate_range_start(addrB)}}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+2] ---------------------------------------------------------------
      CPU-thread-0  {COW_step1: {update page table point to new page for addrA}}
      CPU-thread-1  {COW_step1: {update page table point to new page for addrB}}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+3] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {preempted}
      CPU-thread-2  {write to addrA which is a write to new page}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+3] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {preempted}
      CPU-thread-2  {}
      CPU-thread-3  {write to addrB which is a write to new page}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+4] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {COW_step3: {mmu_notifier_invalidate_range_end(addrB)}}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {}
      DEV-thread-2  {}
      [Time N+5] ---------------------------------------------------------------
      CPU-thread-0  {preempted}
      CPU-thread-1  {}
      CPU-thread-2  {}
      CPU-thread-3  {}
      DEV-thread-0  {read addrA from old page}
      DEV-thread-2  {read addrB from new page}
      
      So here because at time N+2 the clear page table entry was not pair with a
      notification to invalidate the secondary TLB, the device see the new value
      for addrB before seing the new value for addrA.  This break total memory
      ordering for the device.
      
      When changing a pte to write protect or to point to a new write protected
      page with same content (KSM) it is ok to delay invalidate_range callback
      to mmu_notifier_invalidate_range_end() outside the page table lock.  This
      is true even if the thread doing page table update is preempted right
      after releasing page table lock before calling
      mmu_notifier_invalidate_range_end
      
      Thanks to Andrea for thinking of a problematic scenario for COW.
      
      [jglisse@redhat.com: v2]
        Link: http://lkml.kernel.org/r/20171017031003.7481-2-jglisse@redhat.com
      Link: http://lkml.kernel.org/r/20170901173011.10745-1-jglisse@redhat.com
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f10851e
    • Yafang Shao's avatar
      mm/page-writeback.c: print a warning if the vm dirtiness settings are illogical · 0f6d24f8
      Yafang Shao authored
      The vm direct limit setting must be set greater than vm background limit
      setting.  Otherwise print a warning to help the operator to figure out
      that the vm dirtiness settings is in illogical state.
      
      Link: http://lkml.kernel.org/r/1506592464-30962-1-git-send-email-laoar.shao@gmail.com
      
      Signed-off-by: default avatarYafang Shao <laoar.shao@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f6d24f8
  5. 15 Nov, 2017 7 commits
  6. 14 Nov, 2017 2 commits
  7. 13 Nov, 2017 5 commits
  8. 11 Nov, 2017 2 commits