1. 07 Sep, 2017 30 commits
    • Michal Hocko's avatar
      mm, page_alloc: rip out ZONELIST_ORDER_ZONE · c9bff3ee
      Michal Hocko authored
      Patch series "cleanup zonelists initialization", v1.
      
      This is aimed at cleaning up the zonelists initialization code we have
      but the primary motivation was bug report [2] which got resolved but the
      usage of stop_machine is just too ugly to live.  Most patches are
      straightforward but 3 of them need a special consideration.
      
      Patch 1 removes zone ordered zonelists completely.  I am CCing linux-api
      because this is a user visible change.  As I argue in the patch
      description I do not think we have a strong usecase for it these days.
      I have kept sysctl in place and warn into the log if somebody tries to
      configure zone lists ordering.  If somebody has a real usecase for it we
      can revert this patch but I do not expect anybody will actually notice
      runtime differences.  This patch is not strictly needed for the rest but
      it made patch 6 easier to implement.
      
      Patch 7 removes stop_machine from build_all_zonelists without adding any
      special synchronization between iterators and updater which I _believe_
      is acceptable as explained in the changelog.  I hope I am not missing
      anything.
      
      Patch 8 then removes zonelists_mutex which is kind of ugly as well and
      not really needed AFAICS but a care should be taken when double checking
      my thinking.
      
      This patch (of 9):
      
      Supporting zone ordered zonelists costs us just a lot of code while the
      usefulness is arguable if existent at all.  Mel has already made node
      ordering default on 64b systems.  32b systems are still using
      ZONELIST_ORDER_ZONE because it is considered better to fallback to a
      different NUMA node rather than consume precious lowmem zones.
      
      This argument is, however, weaken by the fact that the memory reclaim
      has been reworked to be node rather than zone oriented.  This means that
      lowmem requests have to skip over all highmem pages on LRUs already and
      so zone ordering doesn't save the reclaim time much.  So the only
      advantage of the zone ordering is under a light memory pressure when
      highmem requests do not ever hit into lowmem zones and the lowmem
      pressure doesn't need to reclaim.
      
      Considering that 32b NUMA systems are rather suboptimal already and it
      is generally advisable to use 64b kernel on such a HW I believe we
      should rather care about the code maintainability and just get rid of
      ZONELIST_ORDER_ZONE altogether.  Keep systcl in place and warn if
      somebody tries to set zone ordering either from kernel command line or
      the sysctl.
      
      [mhocko@suse.com: reading vm.numa_zonelist_order will never terminate]
      Link: http://lkml.kernel.org/r/20170721143915.14161-2-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9bff3ee
    • Minchan Kim's avatar
      zram: add config and doc file for writeback feature · 5a47074f
      Minchan Kim authored
      This patch adds document and kconfig for using of writeback feature.
      
      Link: http://lkml.kernel.org/r/1498459987-24562-10-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5a47074f
    • Minchan Kim's avatar
      zram: read page from backing device · 8e654f8f
      Minchan Kim authored
      This patch enables read IO from backing device.  For the feature, it
      implements two IO read functions to transfer data from backing storage.
      
      One is asynchronous IO function and other is synchronous one.
      
      A reason I need synchrnous IO is due to partial write which need to
      complete read IO before the overwriting partial data.
      
      We can make the partial IO's case asynchronous, too but at the moment, I
      don't feel adding more complexity to support such rare use cases so want
      to go with simple.
      
      [xieyisheng1@huawei.com: read_from_bdev_async(): return 1 to avoid call page_endio() in zram_rw_page()]
        Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
      Link: http://lkml.kernel.org/r/1498459987-24562-9-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8e654f8f
    • Minchan Kim's avatar
      zram: write incompressible pages to backing device · db8ffbd4
      Minchan Kim authored
      This patch enables write IO to transfer data to backing device.  For
      that, it implements write_to_bdev function which creates new bio and
      chaining with parent bio to make the parent bio asynchrnous.
      
      For rw_page which don't have parent bio, it submit owned bio and handle
      IO completion by zram_page_end_io.
      
      Also, this patch defines new flag ZRAM_WB to mark written page for later
      read IO.
      
      [xieyisheng1@huawei.com: fix typo in comment]
        Link: http://lkml.kernel.org/r/1502707447-6944-2-git-send-email-xieyisheng1@huawei.com
      Link: http://lkml.kernel.org/r/1498459987-24562-8-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db8ffbd4
    • Minchan Kim's avatar
      zram: identify asynchronous IO's return value · ae85a807
      Minchan Kim authored
      For upcoming asynchronous IO like writeback, zram_rw_page should be
      aware of that whether requested IO was completed or submitted
      successfully, otherwise error.
      
      For the goal, zram_bvec_rw has three return values.
      
      -errno: returns error number
           0: IO request is done synchronously
           1: IO request is issued successfully.
      
      Link: http://lkml.kernel.org/r/1498459987-24562-7-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae85a807
    • Minchan Kim's avatar
      zram: add free space management in backing device · 1363d466
      Minchan Kim authored
      With backing device, zram needs management of free space of backing
      device.
      
      This patch adds bitmap logic to manage free space which is very naive.
      However, it would be simple enough as considering uncompressible pages's
      frequenty in zram.
      
      Link: http://lkml.kernel.org/r/1498459987-24562-6-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1363d466
    • Minchan Kim's avatar
      zram: add interface to specif backing device · 013bf95a
      Minchan Kim authored
      For writeback feature, user should set up backing device before the zram
      working.
      
      This patch enables the interface via /sys/block/zramX/backing_dev.
      
      Currently, it supports block device only but it could be enhanced for
      file as well.
      
      Link: http://lkml.kernel.org/r/1498459987-24562-5-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      013bf95a
    • Minchan Kim's avatar
      zram: rename zram_decompress_page to __zram_bvec_read · 693dc1ce
      Minchan Kim authored
      zram_decompress_page naming is not proper because it doesn't decompress
      if page was dedup hit or stored with compression.
      
      Use more abstract term and consistent with write path function
      __zram_bvec_write.
      
      Link: http://lkml.kernel.org/r/1498459987-24562-4-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      693dc1ce
    • Minchan Kim's avatar
      zram: inline zram_compress · 97ec7c8b
      Minchan Kim authored
      zram_compress does several things, compress, entry alloc and check
      limitation.  I did for just readbility but it hurts modulization.:(
      
      So this patch removes zram_compress functions and inline it in
      __zram_bvec_write for upcoming patches.
      
      Link: http://lkml.kernel.org/r/1498459987-24562-3-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Juneho Choi <juno.choi@lge.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      97ec7c8b
    • Minchan Kim's avatar
      zram: clean up duplicated codes in __zram_bvec_write · 4ebbe7f7
      Minchan Kim authored
      Patch series "writeback incompressible pages to storage", v1.
      
      zRam is useful for memory saving with compressible pages but sometime,
      workload can be changed and system has lots of incompressible pages
      which is very harmful for zram.
      
      This patch supports writeback feature of zram so admin can set up a
      block device and with it, zram can save the memory via writing out the
      incompressile pages once it found it's incompressible pages (1/4 comp
      ratio) instead of keeping the page in memory.
      
      [1-3] is just clean up and [4-8] is step by step feature enablement.
      [4-8] is logically not bisectable(ie, logical unit separation)
      although I tried to compiled out without breaking but I think it would
      be better to review.
      
      This patch (of 9):
      
      __zram_bvec_write has some of duplicated logic for zram meta data
      handling of same_page|compressed_page.  This patch aims to clean it up
      without behavior change.
      
      [xieyisheng1@huawei.com: fix compr_data_size stat]
        Link: http://lkml.kernel.org/r/1502707447-6944-1-git-send-email-xieyisheng1@huawei.com
      Link: http://lkml.kernel.org/r/1496019048-27016-1-git-send-email-minchan@kernel.org
      Link: http://lkml.kernel.org/r/1498459987-24562-2-git-send-email-minchan@kernel.org
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Signed-off-by: default avatarYisheng Xie <xieyisheng1@huawei.com>
      Reviewed-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Juneho Choi <juno.choi@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4ebbe7f7
    • Michal Hocko's avatar
      mm, memory_hotplug: remove zone restrictions · c6f03e29
      Michal Hocko authored
      Historically we have enforced that any kernel zone (e.g ZONE_NORMAL) has
      to precede the Movable zone in the physical memory range.  The purpose
      of the movable zone is, however, not bound to any physical memory
      restriction.  It merely defines a class of migrateable and reclaimable
      memory.
      
      There are users (e.g.  CMA) who might want to reserve specific physical
      memory ranges for their own purpose.  Moreover our pfn walkers have to
      be prepared for zones overlapping in the physical range already because
      we do support interleaving NUMA nodes and therefore zones can interleave
      as well.  This means we can allow each memory block to be associated
      with a different zone.
      
      Loosen the current onlining semantic and allow explicit onlining type on
      any memblock.  That means that online_{kernel,movable} will be allowed
      regardless of the physical address of the memblock as long as it is
      offline of course.  This might result in moveble zone overlapping with
      other kernel zones.  Default onlining then becomes a bit tricky but
      still sensible.  echo online > memoryXY/state will online the given
      block to
      
      	1) the default zone if the given range is outside of any zone
      	2) the enclosing zone if such a zone doesn't interleave with
      	   any other zone
              3) the default zone if more zones interleave for this range
      
      where default zone is movable zone only if movable_node is enabled
      otherwise it is a kernel zone.
      
      Here is an example of the semantic with (movable_node is not present but
      it work in an analogous way). We start with following memblocks, all of
      them offline:
      
        memory34/valid_zones:Normal Movable
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Normal Movable
        memory38/valid_zones:Normal Movable
        memory39/valid_zones:Normal Movable
        memory40/valid_zones:Normal Movable
        memory41/valid_zones:Normal Movable
      
      Now, we online block 34 in default mode and block 37 as movable
      
        root@test1:/sys/devices/system/node/node1# echo online > memory34/state
        root@test1:/sys/devices/system/node/node1# echo online_movable > memory37/state
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Movable
        memory38/valid_zones:Normal Movable
        memory39/valid_zones:Normal Movable
        memory40/valid_zones:Normal Movable
        memory41/valid_zones:Normal Movable
      
      As we can see all other blocks can still be onlined both into Normal and
      Movable zones and the Normal is default because the Movable zone spans
      only block37 now.
      
        root@test1:/sys/devices/system/node/node1# echo online_movable > memory41/state
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Movable
        memory38/valid_zones:Movable Normal
        memory39/valid_zones:Movable Normal
        memory40/valid_zones:Movable Normal
        memory41/valid_zones:Movable
      
      Now the default zone for blocks 37-41 has changed because movable zone
      spans that range.
      
        root@test1:/sys/devices/system/node/node1# echo online_kernel > memory39/state
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal Movable
        memory36/valid_zones:Normal Movable
        memory37/valid_zones:Movable
        memory38/valid_zones:Normal Movable
        memory39/valid_zones:Normal
        memory40/valid_zones:Movable Normal
        memory41/valid_zones:Movable
      
      Note that the block 39 now belongs to the zone Normal and so block38
      falls into Normal by default as well.
      
      For completness
      
        root@test1:/sys/devices/system/node/node1# for i in memory[34]?
        do
      	echo online > $i/state 2>/dev/null
        done
      
        memory34/valid_zones:Normal
        memory35/valid_zones:Normal
        memory36/valid_zones:Normal
        memory37/valid_zones:Movable
        memory38/valid_zones:Normal
        memory39/valid_zones:Normal
        memory40/valid_zones:Movable
        memory41/valid_zones:Movable
      
      Implementation wise the change is quite straightforward.  We can get rid
      of allow_online_pfn_range altogether.  online_pages allows only offline
      nodes already.  The original default_zone_for_pfn will become
      default_kernel_zone_for_pfn.  New default_zone_for_pfn implements the
      above semantic.  zone_for_pfn_range is slightly reorganized to implement
      kernel and movable online type explicitly and MMOP_ONLINE_KEEP becomes a
      catch all default behavior.
      
      Link: http://lkml.kernel.org/r/20170714121233.16861-3-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarReza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kani Toshimitsu <toshi.kani@hpe.com>
      Cc: <slaoub@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c6f03e29
    • Michal Hocko's avatar
      mm, memory_hotplug: display allowed zones in the preferred ordering · e5e68930
      Michal Hocko authored
      Prior to commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") we used to allow to change the
      valid zone types of a memory block if it is adjacent to a different zone
      type.
      
      This fact was reflected in memoryNN/valid_zones by the ordering of
      printed zones.  The first one was default (echo online > memoryNN/state)
      and the other one could be onlined explicitly by online_{movable,kernel}.
      
      This behavior was removed by the said patch and as such the ordering was
      not all that important.  In most cases a kernel zone would be default
      anyway.  The only exception is movable_node handled by "mm,
      memory_hotplug: support movable_node for hotpluggable nodes".
      
      Let's reintroduce this behavior again because later patch will remove
      the zone overlap restriction and so user will be allowed to online
      kernel resp.  movable block regardless of its placement.  Original
      behavior will then become significant again because it would be
      non-trivial for users to see what is the default zone to online into.
      
      Implementation is really simple.  Pull out zone selection out of
      move_pfn_range into zone_for_pfn_range helper and use it in
      show_valid_zones to display the zone for default onlining and then both
      kernel and movable if they are allowed.  Default online zone is not
      duplicated.
      
      Link: http://lkml.kernel.org/r/20170714121233.16861-2-mhocko@kernel.org
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Kani Toshimitsu <toshi.kani@hpe.com>
      Cc: <slaoub@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5e68930
    • Wei Yang's avatar
      mm/memory_hotplug: just build zonelist for newly added node · c1152583
      Wei Yang authored
      Commit 9adb62a5 ("mm/hotplug: correctly setup fallback zonelists
      when creating new pgdat") tries to build the correct zonelist for a
      newly added node, while it is not necessary to rebuild it for already
      exist nodes.
      
      In build_zonelists(), it will iterate on nodes with memory.  For a newly
      added node, it will have memory until node_states_set_node() is called
      in online_pages().
      
      This patch avoids rebuilding the zonelists for already existing nodes.
      
      build_zonelists_node() uses managed_zone(zone) checks, so it should not
      include empty zones anyway.  So effectively we avoid some pointless work
      under stop_machine().
      
      [akpm@linux-foundation.org: tweak comment text]
      [akpm@linux-foundation.org: coding-style tweak, per Vlastimil]
      Link: http://lkml.kernel.org/r/20170626035822.50155-1-richard.weiyang@gmail.com
      
      Signed-off-by: default avatarWei Yang <richard.weiyang@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c1152583
    • Chris Wilson's avatar
      drm/i915: wire up shrinkctl->nr_scanned · 912d572d
      Chris Wilson authored
      shrink_slab() allows us to report back the number of objects we
      successfully scanned (out of the target shrinkctl->nr_to_scan).  As
      report the number of pages owned by each GEM object as a separate item
      to the shrinker, we cannot precisely control the number of shrinker
      objects we scan on each pass; and indeed may free more than requested.
      If we fail to tell the shrinker about the number of objects we process,
      it will continue to hold a grudge against us as any objects left
      unscanned are added to the next reclaim -- and so we will keep on
      "unfairly" shrinking our own slab in comparison to other slabs.
      
      Link: http://lkml.kernel.org/r/20170822135325.9191-2-chris@chris-wilson.co.uk
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      912d572d
    • Chris Wilson's avatar
      mm: track actual nr_scanned during shrink_slab() · d460acb5
      Chris Wilson authored
      Some shrinkers may only be able to free a bunch of objects at a time,
      and so free more than the requested nr_to_scan in one pass.
      
      Whilst other shrinkers may find themselves even unable to scan as many
      objects as they counted, and so underreport.  Account for the extra
      freed/scanned objects against the total number of objects we intend to
      scan, otherwise we may end up penalising the slab far more than
      intended.  Similarly, we want to add the underperforming scan to the
      deferred pass so that we try harder and harder in future passes.
      
      Link: http://lkml.kernel.org/r/20170822135325.9191-1-chris@chris-wilson.co.uk
      
      Signed-off-by: default avatarChris Wilson <chris@chris-wilson.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d460acb5
    • Alexander Popov's avatar
      mm/slub.c: add a naive detection of double free or corruption · ce6fa91b
      Alexander Popov authored
      Add an assertion similar to "fasttop" check in GNU C Library allocator
      as a part of SLAB_FREELIST_HARDENED feature.  An object added to a
      singly linked freelist should not point to itself.  That helps to detect
      some double free errors (e.g. CVE-2017-2636) without slub_debug and
      KASAN.
      
      Link: http://lkml.kernel.org/r/1502468246-1262-1-git-send-email-alex.popov@linux.com
      
      Signed-off-by: default avatarAlexander Popov <alex.popov@linux.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Paul E McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tycho Andersen <tycho@docker.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce6fa91b
    • Kees Cook's avatar
      mm: add SLUB free list pointer obfuscation · 2482ddec
      Kees Cook authored
      This SLUB free list pointer obfuscation code is modified from Brad
      Spengler/PaX Team's code in the last public patch of grsecurity/PaX
      based on my understanding of the code.  Changes or omissions from the
      original code are mine and don't reflect the original grsecurity/PaX
      code.
      
      This adds a per-cache random value to SLUB caches that is XORed with
      their freelist pointer address and value.  This adds nearly zero
      overhead and frustrates the very common heap overflow exploitation
      method of overwriting freelist pointers.
      
      A recent example of the attack is written up here:
      
        http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit
      
      and there is a section dedicated to the technique the book "A Guide to
      Kernel Exploitation: Attacking the Core".
      
      This is based on patches by Daniel Micay, and refactored to minimize the
      use of #ifdef.
      
      With 200-count cycles of "hackbench -g 20 -l 1000" I saw the following
      run times:
      
       before:
       	mean 10.11882499999999999995
      	variance .03320378329145728642
      	stdev .18221905304181911048
      
        after:
      	mean 10.12654000000000000014
      	variance .04700556623115577889
      	stdev .21680767106160192064
      
      The difference gets lost in the noise, but if the above is to be taken
      literally, using CONFIG_FREELIST_HARDENED is 0.07% slower.
      
      Link: http://lkml.kernel.org/r/20170802180609.GA66807@beast
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Suggested-by: default avatarDaniel Micay <danielmicay@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tycho Andersen <tycho@docker.com>
      Cc: Alexander Popov <alex.popov@linux.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2482ddec
    • Alexander Potapenko's avatar
      slub: tidy up initialization ordering · ea37df54
      Alexander Potapenko authored
       - free_kmem_cache_nodes() frees the cache node before nulling out a
         reference to it
      
       - init_kmem_cache_nodes() publishes the cache node before initializing
         it
      
      Neither of these matter at runtime because the cache nodes cannot be
      looked up by any other thread.  But it's neater and more consistent to
      reorder these.
      
      Link: http://lkml.kernel.org/r/20170707083408.40410-1-glider@google.com
      
      Signed-off-by: default avatarAlexander Potapenko <glider@google.com>
      Acked-by: default avatarChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea37df54
    • Jun Piao's avatar
      ocfs2: clean up some dead code · 964f14a0
      Jun Piao authored
      clean up some unused functions and parameters.
      
      Link: http://lkml.kernel.org/r/598A5E21.2080807@huawei.com
      
      Signed-off-by: default avatarJun Piao <piaojun@huawei.com>
      Reviewed-by: default avatarAlex Chen <alex.chen@huawei.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      964f14a0
    • Jan Kara's avatar
      ocfs2: make ocfs2_set_acl() static · 01ffb56b
      Jan Kara authored
      The function is never called outside of fs/ocfs2/acl.c.
      
      Link: http://lkml.kernel.org/r/20170801141252.19675-2-jack@suse.cz
      
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      01ffb56b
    • Masahiro Yamada's avatar
      modpost: simplify sec_name() · 6124c04c
      Masahiro Yamada authored
      There is code duplication between sec_name() and sech_name().  Simplify
      sec_name() by re-using sech_name().  Also, move them up to remove the
      forward declaration of sec_name().
      
      Link: http://lkml.kernel.org/r/1502248721-22009-1-git-send-email-yamada.masahiro@socionext.com
      
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6124c04c
    • Nicolas Iooss's avatar
      dax: initialize variable pfn before using it · 2f52074d
      Nicolas Iooss authored
      dax_pmd_insert_mapping() contains the following code:
      
              pfn_t pfn;
              if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
                  goto fallback;
              /* ... */
          fallback:
            trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
      
      When the condition in the if statement fails, the function calls
      trace_dax_pmd_insert_mapping_fallback() with an uninitialized pfn value.
      
      This issue has been found while building the kernel with clang.  The
      compiler reported:
      
          fs/dax.c:1280:6: error: variable 'pfn' is used uninitialized
          whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
              if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          fs/dax.c:1310:60: note: uninitialized use occurs here
            trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
                                                                           ^~~
      
      Link: http://lkml.kernel.org/r/20170903083000.587-1-nicolas.iooss_linux@m4x.org
      
      Signed-off-by: default avatarNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Reviewed-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2f52074d
    • Ross Zwisler's avatar
      dax: use PG_PMD_COLOUR instead of open coding · 917f3452
      Ross Zwisler authored
      Use ~PG_PMD_COLOUR in dax_entry_waitqueue() instead of open coding an
      equivalent page offset mask.
      
      Link: http://lkml.kernel.org/r/20170822222436.18926-2-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Slusarz, Marcin" <marcin.slusarz@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      917f3452
    • Ross Zwisler's avatar
      dax: explain how read(2)/write(2) addresses are validated · a2e050f5
      Ross Zwisler authored
      Add a comment explaining how the user addresses provided to read(2) and
      write(2) are validated in the DAX I/O path.
      
      We call dax_copy_from_iter() or copy_to_iter() on these without calling
      access_ok() first in the DAX code, and there was a concern that the user
      might be able to read/write to arbitrary kernel addresses with this
      path.
      
      Link: http://lkml.kernel.org/r/20170816173615.10098-1-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a2e050f5
    • Ross Zwisler's avatar
      dax: move all DAX radix tree defs to fs/dax.c · 527b19d0
      Ross Zwisler authored
      Now that we no longer insert struct page pointers in DAX radix trees the
      page cache code no longer needs to know anything about DAX exceptional
      entries.  Move all the DAX exceptional entry definitions from dax.h to
      fs/dax.c.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-6-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      527b19d0
    • Ross Zwisler's avatar
      dax: remove DAX code from page_cache_tree_insert() · d01ad197
      Ross Zwisler authored
      Now that we no longer insert struct page pointers in DAX radix trees we
      can remove the special casing for DAX in page_cache_tree_insert().
      
      This also allows us to make dax_wake_mapping_entry_waiter() local to
      fs/dax.c, removing it from dax.h.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-5-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d01ad197
    • Ross Zwisler's avatar
      dax: use common 4k zero page for dax mmap reads · 91d25ba8
      Ross Zwisler authored
      When servicing mmap() reads from file holes the current DAX code
      allocates a page cache page of all zeroes and places the struct page
      pointer in the mapping->page_tree radix tree.
      
      This has three major drawbacks:
      
      1) It consumes memory unnecessarily. For every 4k page that is read via
         a DAX mmap() over a hole, we allocate a new page cache page. This
         means that if you read 1GiB worth of pages, you end up using 1GiB of
         zeroed memory. This is easily visible by looking at the overall
         memory consumption of the system or by looking at /proc/[pid]/smaps:
      
      	7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:             1048576 kB
      	Pss:             1048576 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:   1048576 kB
      	Private_Dirty:         0 kB
      	Referenced:      1048576 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      2) It is slower than using a common zero page because each page fault
         has more work to do. Instead of just inserting a common zero page we
         have to allocate a page cache page, zero it, and then insert it. Here
         are the average latencies of dax_load_hole() as measured by ftrace on
         a random test box:
      
          Old method, using zeroed page cache pages:	3.4 us
          New method, using the common 4k zero page:	0.8 us
      
         This was the average latency over 1 GiB of sequential reads done by
         this simple fio script:
      
           [global]
           size=1G
           filename=/root/dax/data
           fallocate=none
           [io]
           rw=read
           ioengine=mmap
      
      3) The fact that we had to check for both DAX exceptional entries and
         for page cache pages in the radix tree made the DAX code more
         complex.
      
      Solve these issues by following the lead of the DAX PMD code and using a
      common 4k zero page instead.  As with the PMD code we will now insert a
      DAX exceptional entry into the radix tree instead of a struct page
      pointer which allows us to remove all the special casing in the DAX
      code.
      
      Note that we do still pretty aggressively check for regular pages in the
      DAX radix tree, especially where we take action based on the bits set in
      the page.  If we ever find a regular page in our radix tree now that
      most likely means that someone besides DAX is inserting pages (which has
      happened lots of times in the past), and we want to find that out early
      and fail loudly.
      
      This solution also removes the extra memory consumption.  Here is that
      same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
      code:
      
      	7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:                   0 kB
      	Pss:                   0 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:         0 kB
      	Private_Dirty:         0 kB
      	Referenced:            0 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      Overall system memory consumption is similarly improved.
      
      Another major change is that we remove dax_pfn_mkwrite() from our fault
      flow, and instead rely on the page fault itself to make the PTE dirty
      and writeable.  The following description from the patch adding the
      vm_insert_mixed_mkwrite() call explains this a little more:
      
         "To be able to use the common 4k zero page in DAX we need to have our
          PTE fault path look more like our PMD fault path where a PTE entry
          can be marked as dirty and writeable as it is first inserted rather
          than waiting for a follow-up dax_pfn_mkwrite() =>
          finish_mkwrite_fault() call.
      
          Right now we can rely on having a dax_pfn_mkwrite() call because we
          can distinguish between these two cases in do_wp_page():
      
                  case 1: 4k zero page => writable DAX storage
                  case 2: read-only DAX storage => writeable DAX storage
      
          This distinction is made by via vm_normal_page(). vm_normal_page()
          returns false for the common 4k zero page, though, just as it does
          for DAX ptes. Instead of special casing the DAX + 4k zero page case
          we will simplify our DAX PTE page fault sequence so that it matches
          our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
          We will instead use dax_iomap_fault() to handle write-protection
          faults.
      
          This means that insert_pfn() needs to follow the lead of
          insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
          'mkwrite' is set insert_pfn() will do the work that was previously
          done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      91d25ba8
    • Ross Zwisler's avatar
      dax: relocate some dax functions · e30331ff
      Ross Zwisler authored
      dax_load_hole() will soon need to call dax_insert_mapping_entry(), so it
      needs to be moved lower in dax.c so the definition exists.
      
      dax_wake_mapping_entry_waiter() will soon be removed from dax.h and be
      made static to dax.c, so we need to move its definition above all its
      callers.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-3-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e30331ff
    • Ross Zwisler's avatar
      mm: add vm_insert_mixed_mkwrite() · b2770da6
      Ross Zwisler authored
      When servicing mmap() reads from file holes the current DAX code
      allocates a page cache page of all zeroes and places the struct page
      pointer in the mapping->page_tree radix tree.  This has three major
      drawbacks:
      
      1) It consumes memory unnecessarily. For every 4k page that is read via
         a DAX mmap() over a hole, we allocate a new page cache page. This
         means that if you read 1GiB worth of pages, you end up using 1GiB of
         zeroed memory.
      
      2) It is slower than using a common zero page because each page fault
         has more work to do. Instead of just inserting a common zero page we
         have to allocate a page cache page, zero it, and then insert it.
      
      3) The fact that we had to check for both DAX exceptional entries and
         for page cache pages in the radix tree made the DAX code more
         complex.
      
      This series solves these issues by following the lead of the DAX PMD
      code and using a common 4k zero page instead.  This reduces memory usage
      and decreases latencies for some workloads, and it simplifies the DAX
      code, removing over 100 lines in total.
      
      This patch (of 5):
      
      To be able to use the common 4k zero page in DAX we need to have our PTE
      fault path look more like our PMD fault path where a PTE entry can be
      marked as dirty and writeable as it is first inserted rather than
      waiting for a follow-up dax_pfn_mkwrite() => finish_mkwrite_fault()
      call.
      
      Right now we can rely on having a dax_pfn_mkwrite() call because we can
      distinguish between these two cases in do_wp_page():
      
      	case 1: 4k zero page => writable DAX storage
      	case 2: read-only DAX storage => writeable DAX storage
      
      This distinction is made by via vm_normal_page().  vm_normal_page()
      returns false for the common 4k zero page, though, just as it does for
      DAX ptes.  Instead of special casing the DAX + 4k zero page case we will
      simplify our DAX PTE page fault sequence so that it matches our DAX PMD
      sequence, and get rid of the dax_pfn_mkwrite() helper.  We will instead
      use dax_iomap_fault() to handle write-protection faults.
      
      This means that insert_pfn() needs to follow the lead of
      insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag.  If 'mkwrite'
      is set insert_pfn() will do the work that was previously done by
      wp_page_reuse() as part of the dax_pfn_mkwrite() call path.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-2-ross.zwisler@linux.intel.com
      
      Signed-off-by: default avatarRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b2770da6
    • Dou Liyang's avatar
      metag/numa: remove the unused parent_node() macro · f0cd3406
      Dou Liyang authored
      Commit a7be6e5a ("mm: drop useless local parameters of
      __register_one_node()") removes the last user of parent_node().
      
      The parent_node() macro in METAG architecture is unnecessary.
      
      Remove it for cleanup.
      
      Link: http://lkml.kernel.org/r/1501076076-1974-4-git-send-email-douly.fnst@cn.fujitsu.com
      
      Signed-off-by: default avatarDou Liyang <douly.fnst@cn.fujitsu.com>
      Reported-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0cd3406
  2. 05 Sep, 2017 10 commits
    • Linus Torvalds's avatar
      Merge tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e7d0c41e
      Linus Torvalds authored
      Pull device properties framework updates from Rafael Wysocki:
       "These introduce fwnode operations for all of the separate types of
        'firmware nodes' that can be handled by the device properties
        framework, make the framework use const fwnode arguments all over, add
        a helper for the consolidated handling of node references and switch
        over the framework to the new UUID API.
      
        Specifics:
      
         - Introduce fwnode operations for all of the separate types of
           'firmware nodes' that can be handled by the device properties
           framework and drop the type field from struct fwnode_handle (Sakari
           Ailus, Arnd Bergmann).
      
         - Make the device properties framework use const fwnode arguments
           where possible (Sakari Ailus).
      
         - Add a helper for the consolidated handling of node references to
           the device properties framework (Sakari Ailus).
      
         - Switch over the ACPI part of the device properties framework to the
           new UUID API (Andy Shevchenko)"
      
      * tag 'devprop-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: device property: Switch to use new generic UUID API
        device property: export irqchip_fwnode_ops
        device property: Introduce fwnode_property_get_reference_args
        device property: Constify fwnode property API
        device property: Constify argument to pset fwnode backend
        ACPI: Constify internal fwnode arguments
        ACPI: Constify acpi_bus helper functions, switch to macros
        ACPI: Prepare for constifying acpi_get_next_subnode() fwnode argument
        device property: Get rid of struct fwnode_handle type field
        ACPI: Use IS_ERR_OR_NULL() instead of non-NULL check in is_acpi_data_node()
      e7d0c41e
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 53ac64aa
      Linus Torvalds authored
      Pull ACPI updates from Rafael Wysocki:
       "These include a usual ACPICA code update (this time to upstream
        revision 20170728), a fix for a boot crash on some systems with
        Thunderbolt devices connected at boot time, a rework of the handling
        of PCI bridges when setting up device wakeup, new support for Apple
        device properties, support for DMA configurations reported via ACPI on
        ARM64, APEI-related updates, ACPI EC driver updates and assorted minor
        modifications in several places.
      
        Specifics:
      
         - Update the ACPICA code in the kernel to upstream revision 20170728
           including:
            * Alias operator handling update (Bob Moore).
            * Deferred resolution of reference package elements (Bob Moore).
            * Support for the _DMA method in walk resources (Bob Moore).
            * Tables handling update and support for deferred table
              verification (Lv Zheng).
            * Update of SMMU models for IORT (Robin Murphy).
            * Compiler and disassembler updates (Alex James, Erik Schmauss,
              Ganapatrao Kulkarni, James Morse).
            * Tools updates (Erik Schmauss, Lv Zheng).
            * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv
              Zheng, Shao Ming).
      
         - Rework the initialization of non-wakeup GPEs with method handlers
           in order to address a boot crash on some systems with Thunderbolt
           devices connected at boot time where we miss an early hotplug event
           due to a delay in GPE enabling (Rafael Wysocki).
      
         - Rework the handling of PCI bridges when setting up ACPI-based
           device wakeup in order to avoid disabling wakeup for bridges
           prematurely (Rafael Wysocki).
      
         - Consolidate Apple DMI checks throughout the tree, add support for
           Apple device properties to the device properties framework and use
           these properties for the handling of I2C and SPI devices on Apple
           systems (Lukas Wunner).
      
         - Add support for _DMA to the ACPI-based device properties lookup
           code and make it possible to use the information from there to
           configure DMA regions on ARM64 systems (Lorenzo Pieralisi).
      
         - Fix several issues in the APEI code, add support for exporting the
           BERT error region over sysfs and update APEI MAINTAINERS entry with
           reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit
           Agrawal, Tony Luck, Yazen Ghannam).
      
         - Fix a potential initialization ordering issue in the ACPI EC driver
           and clean it up somewhat (Lv Zheng).
      
         - Update the ACPI SPCR driver to extend the existing XGENE 8250
           workaround in it to a new platform (m400) and to work around an
           Xgene UART clock issue (Graeme Gregory).
      
         - Add a new utility function to the ACPI core to support using ACPI
           OEM ID / OEM Table ID / Revision for system identification in
           blacklisting or similar and switch over the existing code already
           using this information to this new interface (Toshi Kani).
      
         - Fix an xpower PMIC issue related to GPADC reads that always return
           0 without extra pin manipulations (Hans de Goede).
      
         - Add statements to print debug messages in a couple of places in the
           ACPI core for easier diagnostics (Rafael Wysocki).
      
         - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun
           Guo).
      
         - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).
      
         - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver
           (Alex Hung).
      
         - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
           Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
           Ronald Tschalär, Sumeet Pawnikar)"
      
      * tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
        ACPI / APEI: Suppress message if HEST not present
        intel_pstate: convert to use acpi_match_platform_list()
        ACPI / blacklist: add acpi_match_platform_list()
        ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
        ACPI: make device_attribute const
        ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region
        ACPI: APEI: fix the wrong iteration of generic error status block
        ACPI / processor: make function acpi_processor_check_duplicates() static
        ACPI / EC: Clean up EC GPE mask flag
        ACPI: EC: Fix possible issues related to EC initialization order
        ACPI / PM: Add debug statements to acpi_pm_notify_handler()
        ACPI: Add debug statements to acpi_global_event_handler()
        ACPI / scan: Enable GPEs before scanning the namespace
        ACPICA: Make it possible to enable runtime GPEs earlier
        ACPICA: Dispatch active GPEs at init time
        ACPI: SPCR: work around clock issue on xgene UART
        ACPI: SPCR: extend XGENE 8250 workaround to m400
        ACPI / LPSS: Don't abort ACPI scan on missing mem resource
        mailbox: pcc: Drop uninformative output during boot
        ACPI/IORT: Add IORT named component memory address limits
        ...
      53ac64aa
    • Linus Torvalds's avatar
      Merge tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 43964409
      Linus Torvalds authored
      Pull power management updates from Rafael Wysocki:
       "This time (again) cpufreq gets the majority of changes which mostly
        are driver updates (including a major consolidation of intel_pstate),
        some schedutil governor modifications and core cleanups.
      
        There also are some changes in the system suspend area, mostly related
        to diagnostics and debug messages plus some renames of things related
        to suspend-to-idle. One major change here is that suspend-to-idle is
        now going to be preferred over S3 on systems where the ACPI tables
        indicate to do so and provide requsite support (the Low Power Idle S0
        _DSM in particular). The system sleep documentation and the tools
        related to it are updated too.
      
        The rest is a few cpuidle changes (nothing major), devfreq updates,
        generic power domains (genpd) framework updates and a few assorted
        modifications elsewhere.
      
        Specifics:
      
         - Drop the P-state selection algorithm based on a PID controller from
           intel_pstate and make it use the same P-state selection method
           (based on the CPU load) for all types of systems in the active mode
           (Rafael Wysocki, Srinivas Pandruvada).
      
         - Rework the cpufreq core and governors to make it possible to take
           cross-CPU utilization updates into account and modify the schedutil
           governor to actually do so (Viresh Kumar).
      
         - Clean up the handling of transition latency information in the
           cpufreq core and untangle it from the information on which drivers
           cannot do dynamic frequency switching (Viresh Kumar).
      
         - Add support for new SoCs (MT2701/MT7623 and MT7622) to the mediatek
           cpufreq driver and update its DT bindings (Sean Wang).
      
         - Modify the cpufreq dt-platdev driver to autimatically create
           cpufreq devices for the new (v2) Operating Performance Points (OPP)
           DT bindings and update its whitelist of supported systems (Viresh
           Kumar, Shubhrajyoti Datta, Marc Gonzalez, Khiem Nguyen, Finley
           Xiao).
      
         - Add support for Ux500 to the cpufreq-dt driver and drop the
           obsolete dbx500 cpufreq driver (Linus Walleij, Arnd Bergmann).
      
         - Add new SoC (R8A7795) support to the cpufreq rcar driver (Khiem
           Nguyen).
      
         - Fix and clean up assorted issues in the cpufreq drivers and core
           (Arvind Yadav, Christophe Jaillet, Colin Ian King, Gustavo Silva,
           Julia Lawall, Leonard Crestez, Rob Herring, Sudeep Holla).
      
         - Update the IO-wait boost handling in the schedutil governor to make
           it less aggressive (Joel Fernandes).
      
         - Rework system suspend diagnostics to make it print fewer messages
           to the kernel log by default, add a sysfs knob to allow more
           suspend-related messages to be printed and add Low Power S0 Idle
           constraints checks to the ACPI suspend-to-idle code (Rafael
           Wysocki, Srinivas Pandruvada).
      
         - Prefer suspend-to-idle over S3 on ACPI-based systems with the
           ACPI_FADT_LOW_POWER_S0 flag set and the Low Power Idle S0 _DSM
           interface present in the ACPI tables (Rafael Wysocki).
      
         - Update documentation related to system sleep and rename a number of
           items in the code to make it cleare that they are related to
           suspend-to-idle (Rafael Wysocki).
      
         - Export a variable allowing device drivers to check the target
           system sleep state from the core system suspend code (Florian
           Fainelli).
      
         - Clean up the cpuidle subsystem to handle the polling state on x86
           in a more straightforward way and to use %pOF instead of full_name
           (Rafael Wysocki, Rob Herring).
      
         - Update the devfreq framework to fix and clean up a few minor issues
           (Chanwoo Choi, Rob Herring).
      
         - Extend diagnostics in the generic power domains (genpd) framework
           and clean it up slightly (Thara Gopinath, Rob Herring).
      
         - Fix and clean up a couple of issues in the operating performance
           points (OPP) framework (Viresh Kumar, Waldemar Rymarkiewicz).
      
         - Add support for RV1108 to the rockchip-io Adaptive Voltage Scaling
           (AVS) driver (David Wu).
      
         - Fix the usage of notifiers in CPU power management on some
           platforms (Alex Shi).
      
         - Update the pm-graph system suspend/hibernation and boot profiling
           utility (Todd Brandt).
      
         - Make it possible to run the cpupower utility without CPU0 (Prarit
           Bhargava)"
      
      * tag 'pm-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (87 commits)
        cpuidle: Make drivers initialize polling state
        cpuidle: Move polling state initialization code to separate file
        cpuidle: Eliminate the CPUIDLE_DRIVER_STATE_START symbol
        cpufreq: imx6q: Fix imx6sx low frequency support
        cpufreq: speedstep-lib: make several arrays static, makes code smaller
        PM: docs: Delete the obsolete states.txt document
        PM: docs: Describe high-level PM strategies and sleep states
        PM / devfreq: Fix memory leak when fail to register device
        PM / devfreq: Add dependency on PM_OPP
        PM / devfreq: Move private devfreq_update_stats() into devfreq
        PM / devfreq: Convert to using %pOF instead of full_name
        PM / AVS: rockchip-io: add io selectors and supplies for RV1108
        cpufreq: ti: Fix 'of_node_put' being called twice in error handling path
        cpufreq: dt-platdev: Drop few entries from whitelist
        cpufreq: dt-platdev: Automatically create cpufreq device with OPP v2
        ARM: ux500: don't select CPUFREQ_DT
        cpuidle: Convert to using %pOF instead of full_name
        cpufreq: Convert to using %pOF instead of full_name
        PM / Domains: Convert to using %pOF instead of full_name
        cpufreq: Cap the default transition delay value to 10 ms
        ...
      43964409
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · b42a362e
      Linus Torvalds authored
      Pull HID update from Jiri Kosina:
      
       - Wacom driver fixes/updates (device name generation improvements,
         touch ring status support) from Jason Gerecke
      
       - T100 touchpad support from Hans de Goede
      
       - support for batteries driven by HID input reports, from Dmitry
         Torokhov
      
       - Arnd pointed out that driver_lock semaphore is superfluous, as driver
         core already provides all the necessary concurency protection.
         Removal patch from Binoy Jayan
      
       - logical minimum numbering improvements in sensor-hub driver, from
         Srinivas Pandruvada
      
       - support for Microsoft Win8 Wireless Radio Controls extensions from
         João Paulo Rechi Vita
      
       - assorted small fixes and device ID additions
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (28 commits)
        HID: prodikeys: constify snd_rawmidi_ops structures
        HID: sensor: constify platform_device_id
        HID: input: throttle battery uevents
        HID: usbmouse: constify usb_device_id and fix space before '[' error
        HID: usbkbd: constify usb_device_id and fix space before '[' error.
        HID: hid-sensor-hub: Force logical minimum to 1 for power and report state
        HID: wacom: Do not completely map WACOM_HID_WD_TOUCHRINGSTATUS usage
        HID: asus: Add T100CHI bluetooth keyboard dock touchpad support
        HID: ntrig: constify attribute_group structures.
        HID: logitech-hidpp: constify attribute_group structures.
        HID: sensor: constify attribute_group structures.
        HID: multitouch: constify attribute_group structures.
        HID: multitouch: use proper symbolic constant for 0xff310076 application
        HID: multitouch: Support Asus T304UA media keys
        HID: multitouch: Support HID_GD_WIRELESS_RADIO_CTLS
        HID: input: optionally use device id in battery name
        HID: input: map digitizer battery usage
        HID: Remove the semaphore driver_lock
        HID: wacom: add USB_HID dependency
        HID: add ALWAYS_POLL quirk for Logitech 0xc077
        ...
      b42a362e
    • Linus Torvalds's avatar
      Merge tag 'gpio-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 70b8e9eb
      Linus Torvalds authored
      Pull GPIO updates from Linus Walleij:
       "This is the bulk of the GPIO changes for the v4.14 cycle.
      
        Not so much changes this time, phew. David Daney and Bartosz
        Golaszewski did all the really interesting work in infrastructure
        improvement across GPIO and IRQ core, hats off for them and to tglx
        and Marc Z for general help with these patch sets.
      
        Core changes:
      
         - Allow the GPIO irqchip to allocate IRQs dynamically. This is an
           important change on systems where only a restricted number of IRQs,
           lesser than the number of GPIO lines, can be utilized. Now we can
           allocate these on a first-come-first-served basis instead of
           hogging up valuable IRQ lines.
      
         - Serious fix-up of the kerneldoc documentation and inclusion into
           the kerneldoc builds.
      
         - Pulled in the IRQ simulator from the IRQ core tree and use this in
           the GPIO mockup driver for exhaustive testing of interrupt
           abilities.
      
        New drivers:
      
         - New driver for ThunderX and OCTEON-TX. This is especially
           interesting as it picks up improvements from the IRQ core that
           allow us to handle fasteoi ACKs upwards in a hierarchy when there
           are IRQ flag latches on several levels in a hierarchy. Very
           interesting work here.
      
         - New subdriver for Renesas R-Car r8a7745 (RZ/G1E).
      
        Misc:
      
         - Several fixes and improvements for Xilinx Zynq GPIO.
      
         - Support an enablement GPIO for the 74x164 GPIO.
      
         - Switch a bunch of chips to use devres to allocate irq descriptors.
      
         - A bunch of constification fixes"
      
      * tag 'gpio-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (63 commits)
        gpio: mockup: remove unused variable gc
        gpio: pl061: constify amba_id
        Revert "gpiolib: request the gpio before querying its direction"
        gpio: twl6040: remove unneeded forward declaration
        gpio: zevio: make gpio_chip const
        gpio: add gpio_add_lookup_tables() to add several tables at once
        gpio: rcar: Add r8a7745 (RZ/G1E) support
        gpio: brcmstb: check return value of gpiochip_irqchip_add()
        MAINTAINERS: Add entry for THUNDERX GPIO Driver.
        gpio: Add gpio driver support for ThunderX and OCTEON-TX
        gpio: mockup: use irq_sim
        gpio: mxs: use devres for irq generic chip
        gpio: mxc: use devres for irq generic chip
        gpio: pch: use devres for irq generic chip
        gpio: ml-ioh: use devres for irq generic chip
        gpio: sta2x11: use devres for irq generic chip
        gpio: sta2x11: disallow unbinding the driver
        gpio: mxs: disallow unbinding the driver
        gpio: mxc: disallow unbinding the driver
        gpio: aspeed: Remove reference to clock name in debounce warning message
        ...
      70b8e9eb
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · d16605c9
      Linus Torvalds authored
      Pull pin control updates from Linus Walleij:
       "This is the big bulk of pin control changes for the v4.14 kernel.
        There are just a few bigger changes (new drivers mostly) and then a
        lot of small patches all over the place.
      
        Core changes:
         - Decision to wrap the sleep mode of the Spreadtrum and in the future
           others into a specially tagged state. The generic DT bindings and
           the new Spreadtrum driver conforms to this. Others should be moved
           over if possible.
      
        New drivers:
         - Spreadtrum SoCs especially the SC9860 SoC.
         - Storlink/Cortina Gemini 3512 and 3516 SoCs.
      
        New subdrivers:
         - Intel Denverton subdriver.
         - Intel Cannon Lake subdriver.
         - Intel Lewisburg subdriver.
         - Allwinner sunxi: R40 subdriver for A10.
         - Socionext uniphier PXs3 subdriver.
         - Rockchip RK3128 subdriver.
         - Renesas SH-PFC R8A77995 subdriver.
      
        Miscellaneous:
         - Qualcomm APQ8064 can handle general purpose clock muxing.
         - Mediatek MT7623 PCIe mux data fixed up.
         - Intel GPIO IRQs are disabled during suspend.
         - Several fixes and addtions to Renesas r8a7796.
         - Qualcomm SPMI GPIO supports dtest route and LV/MV subtype.
         - Input schmitt trigger support in Rockchip RV1108.
         - Aspeed G4 and G5 USB host/device pin control control added.
         - Qualcomm IPQ4019 has matured with a few missing pin groups and
           control bits put in place.
         - Lots of constification, this is the latest in cocinelle fixes"
      
      * tag 'pinctrl-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (147 commits)
        Revert "pinctrl: sunxi: Don't enforce bias disable (for now)"
        pinctrl: uniphier: fix members of rmii group for Pro4
        pinctrl: Delete an error message
        pinctrl: core: Delete an error message
        pinctrl: intel: Read back TX buffer state
        pinctrl: rockchip: Add rv1108 recalculated iomux support
        pinctrl: intel: Decrease indentation in intel_gpio_set()
        pinctrl: rza1: Remove suffix from gpiochip label
        pinctrl: qcom: spmi-gpio: Correct power_source range check
        pinctrl: freescale: make mxs_regs const
        pinctrl: aspeed: Rework strap register write logic for the AST2500
        pinctrl: rza1: off by one in rza1_parse_gpiochip()
        pinctrl: qcom: General Purpose clocks for apq8064
        pinctrl: sprd: Add Spreadtrum pin control driver
        dt-bindings: pinctrl: Add DT bindings for Spreadtrum SC9860
        pinctrl: Add sleep related state to indicate sleep related configs
        pinctrl: mediatek: update PCIe mux data for MT7623
        pinctrl: intel: Add Intel Lewisburg GPIO support
        pinctrl: intel: Add Intel Cannon Lake PCH-H pin controller support
        pinctrl: aspeed: Fix ast2500 strap register write logic
        ...
      d16605c9
    • Linus Torvalds's avatar
      Merge tag 'regulator-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator · fe9e3138
      Linus Torvalds authored
      Pull regulator updates from Mark Brown:
       "This is an extremely quiet release for the regulator subsystem, it's
        all fairly minor fixes and cleanups plus a few new drivers and ddevice
        ID additions:
      
         - Support for MediaTek MT6380, Ricoh RC5T619 and ST Voltage Reference
           Buffers"
      
      * tag 'regulator-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (24 commits)
        regulator: Add support for stm32-vrefbuf
        regulator: Add STM32 Voltage Reference Buffer
        regulator: pv88090: Exception handling for out of bounds
        regulator: da9063: Return an error code on probe failure
        regulator: rn5t618: add RC5T619 PMIC support
        regulator: ltc3589: constify i2c_device_id
        regulator: fan53555: fix I2C device ids
        regulator: add fixes with MT6397 dt-bindings shouldn't reference driver
        regulator: add fixes with MT6323 dt-bindings shouldn't reference driver
        regulator: add fixes with MT6311 dt-bindings shouldn't reference driver
        regulator: Add document for MediaTek MT6380 regulator
        regulator: mt6380: Add support for MT6380
        regulator: pwm-regulator: Remove unneeded gpiod NULL check
        regulator: core: fix a possible race in disable_work handling
        regulator: fan53555: Use of_device_get_match_data() to simplify probe
        regulator: of: regulator_of_get_init_data() missing of_node_get()
        regulator: pwm-regulator: fix example syntax
        regulator: Convert to using %pOF instead of full_name
        regulator: cpcap: Add OF mode mapping
        regulator: cpcap: Fix standby mode
        ...
      fe9e3138
    • Linus Torvalds's avatar
      Merge tag 'spi-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi · b88f5577
      Linus Torvalds authored
      Pull spi updates from Mark Brown:
       "A fairly quiet release for the SPI subsystem:
      
         - Move to using IDR for allocating bus numbers
      
         - Modernisation of the ep93xx driver, removing a lot of open coding
           and using the framework more
      
         - The tools have been moved to use the standard tools build system
           and an install target added (there will be a fairly trivial
           conflict with tip resulting from the changes in the main tools
           Makefile)
      
         - A refactoring of the Qualcomm QUP driver which enables new variants
           to be supported
      
         - Explicit support for the Freescale i.MX53 and i.MX6 SPI, Renesas
           R-Car H3 and Rockchip RV1108 controllers"
      
      * tag 'spi-v4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: (71 commits)
        spi: spi-falcon: drop check of boot select
        spi: imx: fix use of native chip-selects with devicetree
        spi: pl022: constify amba_id
        spi: imx: fix little-endian build
        spi: omap: Allocate bus number from spi framework
        spi: Kernel coding style fixes
        spi: imx: dynamic burst length adjust for PIO mode
        spi: Pick spi bus number from Linux idr or spi alias
        spi: rockchip: configure CTRLR1 according to size and data frame
        spi: altera: Consolidate TX/RX data register access
        spi: altera: Switch to SPI core transfer queue management
        spi: rockchip: add compatible string for rv1108 spi
        spi: qup: fix 64-bit build warning
        spi: qup: hide warning for uninitialized variable
        spi: spi-ep93xx: use the default master transfer queueing mechanism
        spi: spi-ep93xx: remove private data 'current_msg'
        spi: spi-ep93xx: pass the spi_master pointer around
        spi: spi-ep93xx: absorb the interrupt enable/disable helpers
        spi: spi-ep93xx: add spi master prepare_transfer_hardware()
        spi: spi-ep93xx: use 32-bit read/write for all registers
        ...
      b88f5577
    • Linus Torvalds's avatar
      Merge tag 'edac_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 16a832a2
      Linus Torvalds authored
      Pull EDAC updates from Borislav Petkov:
      
       - pnd2_edac: A minimal sideband driver (Tony Luck)
      
       - small-ish cleanups and fixes all over the place
      
      * tag 'edac_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        EDAC, mce_amd: Get rid of local var in amd_filter_mce()
        EDAC, mce_amd: Get rid of most struct cpuinfo_x86 uses
        EDAC, mce_amd: Rename decode_smca_errors() to decode_smca_error()
        EDAC: Make device_type const
        EDAC, pnd2: Properly toggle hidden state for P2SB PCI device
        EDAC, pnd2: Conditionally unhide/hide the P2SB PCI device to read BAR
        EDAC, pnd2: Mask off the lower four bits of a BAR
        EDAC, thunderx: Fix error handling path in thunderx_lmc_probe()
        EDAC, altera: Fix error handling path in altr_edac_device_probe()
        EDAC, pnd2: Build in a minimal sideband driver for Apollo Lake
        EDAC, sb_edac: Classify memory mirroring modes
        EDAC, cpc925, ppc4xx: Convert to using %pOF instead of full_name
        EDAC: Get rid of mci->mod_ver
        EDAC: Constify attribute_group structures
        EDAC, mce_amd: Use cpu_to_node() to find the node ID
      16a832a2
    • Linus Torvalds's avatar
      Merge tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc · bafb0762
      Linus Torvalds authored
      Pull char/misc driver updates from Greg KH:
       "Here is the big char/misc driver update for 4.14-rc1.
      
        Lots of different stuff in here, it's been an active development cycle
        for some reason. Highlights are:
      
         - updated binder driver, this brings binder up to date with what
           shipped in the Android O release, plus some more changes that
           happened since then that are in the Android development trees.
      
         - coresight updates and fixes
      
         - mux driver file renames to be a bit "nicer"
      
         - intel_th driver updates
      
         - normal set of hyper-v updates and changes
      
         - small fpga subsystem and driver updates
      
         - lots of const code changes all over the driver trees
      
         - extcon driver updates
      
         - fmc driver subsystem upadates
      
         - w1 subsystem minor reworks and new features and drivers added
      
         - spmi driver updates
      
        Plus a smattering of other minor driver updates and fixes.
      
        All of these have been in linux-next with no reported issues for a
        while"
      
      * tag 'char-misc-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (244 commits)
        ANDROID: binder: don't queue async transactions to thread.
        ANDROID: binder: don't enqueue death notifications to thread todo.
        ANDROID: binder: Don't BUG_ON(!spin_is_locked()).
        ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl
        ANDROID: binder: push new transactions to waiting threads.
        ANDROID: binder: remove proc waitqueue
        android: binder: Add page usage in binder stats
        android: binder: fixup crash introduced by moving buffer hdr
        drivers: w1: add hwmon temp support for w1_therm
        drivers: w1: refactor w1_slave_show to make the temp reading functionality separate
        drivers: w1: add hwmon support structures
        eeprom: idt_89hpesx: Support both ACPI and OF probing
        mcb: Fix an error handling path in 'chameleon_parse_cells()'
        MCB: add support for SC31 to mcb-lpc
        mux: make device_type const
        char: virtio: constify attribute_group structures.
        Documentation/ABI: document the nvmem sysfs files
        lkdtm: fix spelling mistake: "incremeted" -> "incremented"
        perf: cs-etm: Fix ETMv4 CONFIGR entry in perf.data file
        nvmem: include linux/err.h from header
        ...
      bafb0762