1. 10 Apr, 2018 1 commit
    • Rasmus Villemoes's avatar
      Kbuild: fix # escaping in .cmd files for future Make · 9564a8cf
      Rasmus Villemoes authored and Masahiro Yamada's avatar Masahiro Yamada committed
      I tried building using a freshly built Make (4.2.1-69-g8a731d1), but
      already the objtool build broke with
      orc_dump.c: In function ‘orc_dump’:
      orc_dump.c:106:2: error: ‘elf_getshnum’ is deprecated [-Werror=deprecated-declarations]
        if (elf_getshdrnum(elf, &nr_sections)) {
      Turns out that with that new Make, the backslash was not removed, so cpp
      didn't see a #include directive, grep found nothing, and
      -DLIBELF_USE_DEPRECATED was wrongly put in CFLAGS.
      Now, that new Make behaviour is documented in their NEWS file:
        * WARNING: Backward-incompatibility!
          Number signs (#) appearing inside a macro reference or function invocation
          no longer introduce comments and should not be escaped with backslashes:
          thus a call such as:
            foo := $(shell echo '#')
          is legal.  Previously the number sign needed to be escaped, for example:
            foo := $(shell echo '\#')
          Now this latter will resolve to "\#".  If you want to write makefiles
          portable to both versions, assign the number sign to a variable:
            C := \#
            foo := $(shell echo '$C')
          This was claimed to be fixed in 3.81, but wasn't, for some reason.
          To detect this change search for 'nocomment' in the .FEATURES variable.
      This also fixes up the two make-cmd instances to replace # with $(pound)
      rather than with \#. There might very well be other places that need
      similar fixup in preparation for whatever future Make release contains
      the above change, but at least this builds an x86_64 defconfig with the
      new make.
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=197847
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
  2. 07 Apr, 2018 17 commits
    • Riku Voipio's avatar
      kbuild: deb-pkg: split generating packaging and build · b41d920a
      Riku Voipio authored and Masahiro Yamada's avatar Masahiro Yamada committed
      Move debian/ directory generation out of builddeb to a new script,
      mkdebian. The package build commands are kept in builddeb, which
      is now an internal command called from debian/rules.
      With these changes in place, we can now use dpkg-buildpackage from
      deb-pkg and bindeb-pkg removing need for handrolled source/changes
      This patch is based on the criticism of the current state of builddeb
      discussed on:
      Signed-off-by: default avatarRiku Voipio <riku.voipio@linaro.org>
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: use -fmacro-prefix-map to make __FILE__ a relative path · a73619a8
      Masahiro Yamada authored
      The __FILE__ macro is used everywhere in the kernel to locate the file
      printing the log message, such as WARN_ON(), etc.  If the kernel is
      built out of tree, this can be a long absolute path, like this:
        WARNING: CPU: 1 PID: 1 at /path/to/build/directory/arch/arm64/kernel/foo.c:...
      This is because Kbuild runs in the objtree instead of the srctree,
      then __FILE__ is expanded to a file path prefixed with $(srctree)/.
      Commit 9da0763b
       ("kbuild: Use relative path when building in a
      subdir of the source tree") improved this to some extent; $(srctree)
      becomes ".." if the objtree is a child of the srctree.
      For other cases of out-of-tree build, __FILE__ is still the absolute
      path.  It also means the kernel image depends on where it was built.
      A brand-new option from GCC, -fmacro-prefix-map, solves this problem.
      If your compiler supports it, __FILE__ is the relative path from the
      srctree regardless of O= option.  This provides more readable log and
      more reproducible builds.
      Please note __FILE__ is always an absolute path for external modules.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers · 54a702f7
      Masahiro Yamada authored
      GNU Make automatically deletes intermediate files that are updated
      in a chain of pattern rules.
      Example 1) %.dtb.o <- %.dtb.S <- %.dtb <- %.dts
      Example 2) %.o <- %.c <- %.c_shipped
      A couple of makefiles mark such targets as .PRECIOUS to prevent Make
      from deleting them, but the correct way is to use .SECONDARY.
          Prerequisites of this special target are treated as intermediate
          files but are never automatically deleted.
          When make is interrupted during execution, it may delete the target
          file it is updating if the file was modified since make started.
          If you mark the file as precious, make will never delete the file
          if interrupted.
      Both can avoid deletion of intermediate files, but the difference is
      the behavior when Make is interrupted; .SECONDARY deletes the target,
      but .PRECIOUS does not.
      The use of .PRECIOUS is relatively rare since we do not want to keep
      partially constructed (possibly corrupted) targets.
      Another difference is that .PRECIOUS works with pattern rules whereas
      .SECONDARY does not.
        .PRECIOUS: $(obj)/%.lex.c
      works, but
        .SECONDARY: $(obj)/%.lex.c
      has no effect.  However, for the reason above, I do not want to use
      .PRECIOUS which could cause obscure build breakage.
      The targets specified as .SECONDARY must be explicit.  $(targets)
      contains all targets that need to include .*.cmd files.  So, the
      intermediates you want to keep are mostly in there.  Therefore, mark
      $(targets) as .SECONDARY.  It means primary targets are also marked
      as .SECONDARY, but I do not see any drawback for this.
      I replaced some .SECONDARY / .PRECIOUS markers with 'targets'.  This
      will make Kbuild search for non-existing .*.cmd files, but this is
      not a noticeable performance issue.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarFrank Rowand <frowand.list@gmail.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
    • Masahiro Yamada's avatar
      kbuild: rename *-asn1.[ch] to *.asn1.[ch] · 4fa8bc94
      Masahiro Yamada authored
      Our convention is to distinguish file types by suffixes with a period
      as a separator.
      *-asn1.[ch] is a different pattern from other generated sources such
      as *.lex.c, *.tab.[ch], *.dtb.S, etc.  More confusing, files with
      '-asn1.[ch]' are generated files, but '_asn1.[ch]' are checked-in
      Rename generated files to *.asn1.[ch] for consistency.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: clean up *-asn1.[ch] patterns from top-level Makefile · 3ca3273e
      Masahiro Yamada authored
      Clean up these patterns from the top Makefile to omit 'clean-files'
      in each Makefile.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore · 9ce285cf
      Masahiro Yamada authored
      These are common patterns where source files are parsed by the
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: add %.dtb.S and %.dtb to 'targets' automatically · a7f92419
      Masahiro Yamada authored
      Another common pattern that consists of chained commands is to compile
      a DTB as binary data into the kernel image or a module.  It is used in
      several places in the source tree.  Support it in the core Makefile.
      $(call if_changed,dt_S_dtb) is more suitable than $(call cmd,dt_S_dtb)
      in case cmd_dt_S_dtb is changed in the future.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarFrank Rowand <frowand.list@gmail.com>
    • Masahiro Yamada's avatar
      kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically · b23d1a24
      Masahiro Yamada authored
      Files generated by if_changed* must be added to 'targets' to include
      *.cmd files.  Otherwise, they would be regenerated every time.
      The build system automatically adds objects to 'targets' where
      appropriate, such as obj-y, extra-y, etc. but does nothing for
      intermediate files.  So, each Makefile needs to add them by itself.
      There are some common cases where objects are generated by chained
      rules.  Lexers and parsers are compiled like follows:
         %.lex.o <- %.lex.c <- %.l
         %.tab.o <- %.tab.c <- %.y
      They are common patterns, so it is reasonable to take care of them
      in the core Makefile instead of requiring each Makefile to do so.
      At this moment, you cannot delete 'target += zconf.lex.c' in the
      Kconfig Makefile because zconf.lex.c is included from zconf.tab.c
      instead of being compiled separately.  It should be deleted after
      Kconfig is more refactored.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarFrank Rowand <frowand.list@gmail.com>
    • Masahiro Yamada's avatar
      genksyms: generate lexer and parser during build instead of shipping · 833e6224
      Masahiro Yamada authored
      Now that the kernel build supports flex and bison, remove the _shipped
      files and generate them during the build instead.
      There are no more shipped lexer and parser, so I ripped off the rules
      in scripts/Malefile.lib that were used for REGENERATE_PARSERS.
      The genksyms parser has ambiguous grammar, which would emit warnings:
       scripts/genksyms/parse.y: warning: 9 shift/reduce conflicts [-Wconflicts-sr]
       scripts/genksyms/parse.y: warning: 5 reduce/reduce conflicts [-Wconflicts-rr]
      They are normally suppressed, but displayed when W=1 is given.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Masahiro Yamada's avatar
      kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile · 9a8dfb39
      Masahiro Yamada authored
      Files suffixed by .lex.c, .tab.[ch] are generated lexers, parsers,
      respectively.  Clean them up globally from the top Makefile.
      Some of the final host programs those lexer/parser are linked into
      are necessary for building external modules, but the intermediates
      are unneeded.  They can be cleaned away by 'make clean' instead of
      'make mrproper'.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarFrank Rowand <frowand.list@gmail.com>
    • Masahiro Yamada's avatar
      .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore · 59889300
      Masahiro Yamada authored
      These patterns are common to host programs that require lexer and parser.
      Move them to the top .gitignore.
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: default avatarFrank Rowand <frowand.list@gmail.com>
    • Robin Jarry's avatar
      kbuild: use HOSTLDFLAGS for single .c executables · 63185b46
      Robin Jarry authored and Masahiro Yamada's avatar Masahiro Yamada committed
      When compiling executables from a single .c file, the linker is also
      invoked. Pass the HOSTLDFLAGS like for other linker commands.
      Signed-off-by: default avatarRobin Jarry <robin.jarry@6wind.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: Masahiro Yamada's avatarMasahiro Yamada <yamada.masahiro@socionext.com>
    • Linus Torvalds's avatar
      Merge tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio · f605ba97
      Linus Torvalds authored
      Pull VFIO updates from Alex Williamson:
       - Adopt iommu_unmap_fast() interface to type1 backend
         (Suravee Suthikulpanit)
       - mdev sample driver fixup (Shunyong Yang)
       - More efficient PFN mapping handling in type1 backend
         (Jason Cai)
       - VFIO device ioeventfd interface (Alex Williamson)
       - Tag new vfio-platform sub-maintainer (Alex Williamson)
      * tag 'vfio-v4.17-rc1' of git://github.com/awilliam/linux-vfio:
        MAINTAINERS: vfio/platform: Update sub-maintainer
        vfio/pci: Add ioeventfd support
        vfio/pci: Use endian neutral helpers
        vfio/pci: Pull BAR mapping setup from read-write path
        vfio/type1: Improve memory pinning process for raw PFN mapping
        vfio-mdev/samples: change RDI interrupt condition
        vfio/type1: Adopt fast IOTLB flush interface when unmap IOVAs
    • Linus Torvalds's avatar
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 016c6f25
      Linus Torvalds authored
      Pull fw_cfg, vhost updates from Michael Tsirkin:
       "This cleans up the qemu fw cfg device driver.
        On top of this, vmcore is dumped there on crash to help debugging
        with kASLR enabled.
        Also included are some fixes in vhost"
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        vhost: add vsock compat ioctl
        vhost: fix vhost ioctl signature to build with clang
        fw_cfg: write vmcoreinfo details
        crash: export paddr_vmcoreinfo_note()
        fw_cfg: add DMA register
        fw_cfg: add a public uapi header
        fw_cfg: handle fw_cfg_read_blob() error
        fw_cfg: remove inline from fw_cfg_read_blob()
        fw_cfg: fix sparse warnings around FW_CFG_FILE_DIR read
        fw_cfg: fix sparse warning reading FW_CFG_ID
        fw_cfg: fix sparse warnings with fw_cfg_file
        fw_cfg: fix sparse warnings in fw_cfg_sel_endianness()
        ptr_ring: fix build
    • Linus Torvalds's avatar
      Merge tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 3c0d551e
      Linus Torvalds authored
      Pull PCI updates from Bjorn Helgaas:
       - move pci_uevent_ers() out of pci.h (Michael Ellerman)
       - skip ASPM common clock warning if BIOS already configured it (Sinan
       - fix ASPM Coverity warning about threshold_ns (Gustavo A. R. Silva)
       - remove last user of pci_get_bus_and_slot() and the function itself
         (Sinan Kaya)
       - add decoding for 16 GT/s link speed (Jay Fang)
       - add interfaces to get max link speed and width (Tal Gilboa)
       - add pcie_bandwidth_capable() to compute max supported link bandwidth
         (Tal Gilboa)
       - add pcie_bandwidth_available() to compute bandwidth available to
         device (Tal Gilboa)
       - add pcie_print_link_status() to log link speed and whether it's
         limited (Tal Gilboa)
       - use PCI core interfaces to report when device performance may be
         limited by its slot instead of doing it in each driver (Tal Gilboa)
       - fix possible cpqphp NULL pointer dereference (Shawn Lin)
       - rescan more of the hierarchy on ACPI hotplug to fix Thunderbolt/xHCI
         hotplug (Mika Westerberg)
       - add support for PCI I/O port space that's neither directly accessible
         via CPU in/out instructions nor directly mapped into CPU physical
         memory space. This is fairly intrusive and includes minor changes to
         interfaces used for I/O space on most platforms (Zhichang Yuan, John
       - add support for HiSilicon Hip06/Hip07 LPC I/O space (Zhichang Yuan,
         John Garry)
       - use PCI_EXP_DEVCTL2_COMP_TIMEOUT in rapidio/tsi721 (Bjorn Helgaas)
       - remove possible NULL pointer dereference in of_pci_bus_find_domain_nr()
         (Shawn Lin)
       - report quirk timings with dev_info (Bjorn Helgaas)
       - report quirks that take longer than 10ms (Bjorn Helgaas)
       - add and use Altera Vendor ID (Johannes Thumshirn)
       - tidy Makefiles and comments (Bjorn Helgaas)
       - don't set up INTx if MSI or MSI-X is enabled to align cris, frv,
         ia64, and mn10300 with x86 (Bjorn Helgaas)
       - move pcieport_if.h to drivers/pci/pcie/ to encapsulate it (Frederick
       - merge pcieport_if.h into portdrv.h (Bjorn Helgaas)
       - move workaround for BIOS PME issue from portdrv to PCI core (Bjorn
       - completely disable portdrv with "pcie_ports=compat" (Bjorn Helgaas)
       - remove portdrv link order dependency (Bjorn Helgaas)
       - remove support for unused VC portdrv service (Bjorn Helgaas)
       - simplify portdrv feature permission checking (Bjorn Helgaas)
       - remove "pcie_hp=nomsi" parameter (use "pci=nomsi" instead) (Bjorn
       - remove unnecessary "pcie_ports=auto" parameter (Bjorn Helgaas)
       - use cached AER capability offset (Frederick Lawler)
       - don't enable DPC if BIOS hasn't granted AER control (Mika Westerberg)
       - rename pcie-dpc.c to dpc.c (Bjorn Helgaas)
       - use generic pci_mmap_resource_range() instead of powerpc and xtensa
         arch-specific versions (David Woodhouse)
       - support arbitrary PCI host bridge offsets on sparc (Yinghai Lu)
       - remove System and Video ROM reservations on sparc (Bjorn Helgaas)
       - probe for device reset support during enumeration instead of runtime
         (Bjorn Helgaas)
       - add ACS quirk for Ampere (née APM) root ports (Feng Kan)
       - add function 1 DMA alias quirk for Marvell 88SE9220 (Thomas
       - protect device restore with device lock (Sinan Kaya)
       - handle failure of FLR gracefully (Sinan Kaya)
       - handle CRS (config retry status) after device resets (Sinan Kaya)
       - skip various config reads for SR-IOV VFs as an optimization
         (KarimAllah Ahmed)
       - consolidate VPD code in vpd.c (Bjorn Helgaas)
       - add Tegra dependency on PCI_MSI_IRQ_DOMAIN (Arnd Bergmann)
       - add DT support for R-Car r8a7743 (Biju Das)
       - fix a PCI_EJECT vs PCI_BUS_RELATIONS race condition in Hyper-V host
         bridge driver that causes a general protection fault (Dexuan Cui)
       - fix Hyper-V host bridge hang in MSI setup on 1-vCPU VMs with SR-IOV
         (Dexuan Cui)
       - fix Hyper-V host bridge hang when ejecting a VF before setting up MSI
         (Dexuan Cui)
       - make several structures static (Fengguang Wu)
       - increase number of MSI IRQs supported by Synopsys DesignWare bridges
         from 32 to 256 (Gustavo Pimentel)
       - implemented multiplexed IRQ domain API and remove obsolete MSI IRQ
         API from DesignWare drivers (Gustavo Pimentel)
       - add Tegra power management support (Manikanta Maddireddy)
       - add Tegra loadable module support (Manikanta Maddireddy)
       - handle 64-bit BARs correctly in endpoint support (Niklas Cassel)
       - support optional regulator for HiSilicon STB (Shawn Guo)
       - use regulator bulk API for Qualcomm apq8064 (Srinivas Kandagatla)
       - support power supplies for Qualcomm msm8996 (Srinivas Kandagatla)
      * tag 'pci-v4.17-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (123 commits)
        MAINTAINERS: Add John Garry as maintainer for HiSilicon LPC driver
        HISI LPC: Add ACPI support
        ACPI / scan: Do not enumerate Indirect IO host children
        ACPI / scan: Rename acpi_is_serial_bus_slave() for more general use
        HISI LPC: Support the LPC host on Hip06/Hip07 with DT bindings
        of: Add missing I/O range exception for indirect-IO devices
        PCI: Apply the new generic I/O management on PCI IO hosts
        PCI: Add fwnode handler as input param of pci_register_io_range()
        PCI: Remove __weak tag from pci_register_io_range()
        MAINTAINERS: Add missing /drivers/pci/cadence directory entry
        fm10k: Report PCIe link properties with pcie_print_link_status()
        net/mlx5e: Use pcie_bandwidth_available() to compute bandwidth
        net/mlx5: Report PCIe link properties with pcie_print_link_status()
        net/mlx4_core: Report PCIe link properties with pcie_print_link_status()
        PCI: Add pcie_print_link_status() to log link speed and whether it's limited
        PCI: Add pcie_bandwidth_available() to compute bandwidth available to device
        misc: pci_endpoint_test: Handle 64-bit BARs properly
        PCI: designware-ep: Make dw_pcie_ep_reset_bar() handle 64-bit BARs properly
        PCI: endpoint: Make sure that BAR_5 does not have 64-bit flag set when clearing
        PCI: endpoint: Make epc->ops->clear_bar()/pci_epc_clear_bar() take struct *epf_bar
    • Linus Torvalds's avatar
      Merge tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 19fd08b8
      Linus Torvalds authored
      Pull rdma updates from Jason Gunthorpe:
       "Doug and I are at a conference next week so if another PR is sent I
        expect it to only be bug fixes. Parav noted yesterday that there are
        some fringe case behavior changes in his work that he would like to
        fix, and I see that Intel has a number of rc looking patches for HFI1
        they posted yesterday.
        Parav is again the biggest contributor by patch count with his ongoing
        work to enable container support in the RDMA stack, followed by Leon
        doing syzkaller inspired cleanups, though most of the actual fixing
        went to RC.
        There is one uncomfortable series here fixing the user ABI to actually
        work as intended in 32 bit mode. There are lots of notes in the commit
        messages, but the basic summary is we don't think there is an actual
        32 bit kernel user of drivers/infiniband for several good reasons.
        However we are seeing people want to use a 32 bit user space with 64
        bit kernel, which didn't completely work today. So in fixing it we
        required a 32 bit rxe user to upgrade their userspace. rxe users are
        still already quite rare and we think a 32 bit one is non-existing.
         - Fix RDMA uapi headers to actually compile in userspace and be more
         - Three shared with netdev pull requests from Mellanox:
            * 7 patches, mostly to net with 1 IB related one at the back).
              This series addresses an IRQ performance issue (patch 1),
              cleanups related to the fix for the IRQ performance problem
              (patches 2-6), and then extends the fragmented completion queue
              support that already exists in the net side of the driver to the
              ib side of the driver (patch 7).
            * Mostly IB, with 5 patches to net that are needed to support the
              remaining 10 patches to the IB subsystem. This series extends
              the current 'representor' framework when the mlx5 driver is in
              switchdev mode from being a netdev only construct to being a
              netdev/IB dev construct. The IB dev is limited to raw Eth queue
              pairs only, but by having an IB dev of this type attached to the
              representor for a switchdev port, it enables DPDK to work on the
              switchdev device.
            * All net related, but needed as infrastructure for the rdma
         - Updates for the hns, i40iw, bnxt_re, cxgb3, cxgb4, hns drivers
         - SRP performance updates
         - IB uverbs write path cleanup patch series from Leon
         - Add RDMA_CM support to ib_srpt. This is disabled by default. Users
           need to set the port for ib_srpt to listen on in configfs in order
           for it to be enabled
         - TSO and Scatter FCS support in mlx4
         - Refactor of modify_qp routine to resolve problems seen while
           working on new code that is forthcoming
         - More refactoring and updates of RDMA CM for containers support from
         - mlx5 'fine grained packet pacing', 'ipsec offload' and 'device
           memory' user API features
         - Infrastructure updates for the new IOCTL interface, based on
           increased usage
         - ABI compatibility bug fixes to fully support 32 bit userspace on 64
           bit kernel as was originally intended. See the commit messages for
           extensive details
         - Syzkaller bugs and code cleanups motivated by them"
      * tag 'for-linus-unmerged' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (199 commits)
        IB/rxe: Fix for oops in rxe_register_device on ppc64le arch
        IB/mlx5: Device memory mr registration support
        net/mlx5: Mkey creation command adjustments
        IB/mlx5: Device memory support in mlx5_ib
        net/mlx5: Query device memory capabilities
        IB/uverbs: Add device memory registration ioctl support
        IB/uverbs: Add alloc/free dm uverbs ioctl support
        IB/uverbs: Add device memory capabilities reporting
        IB/uverbs: Expose device memory capabilities to user
        RDMA/qedr: Fix wmb usage in qedr
        IB/rxe: Removed GID add/del dummy routines
        RDMA/qedr: Zero stack memory before copying to user space
        IB/mlx5: Add ability to hash by IPSEC_SPI when creating a TIR
        IB/mlx5: Add information for querying IPsec capabilities
        IB/mlx5: Add IPsec support for egress and ingress
        {net,IB}/mlx5: Add ipsec helper
        IB/mlx5: Add modify_flow_action_esp verb
        IB/mlx5: Add implementation for create and destroy action_xfrm
        IB/uverbs: Introduce ESP steering match filter
        IB/uverbs: Add modify ESP flow_action
    • Linus Torvalds's avatar
      Merge tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 28da7be5
      Linus Torvalds authored
      Pull mailbox updates from Jassi Brar:
       - New Hi3660 mailbox driver
       - Fix TEGRA Kconfig warning
       - Broadcom: use dma_pool_zalloc instead of dma_pool_alloc+memset
      * tag 'mailbox-v4.17' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: Add support for Hi3660 mailbox
        dt-bindings: mailbox: Introduce Hi3660 controller binding
        mailbox: tegra: relax TEGRA_HSP_MBOX Kconfig dependencies
        maillbox: bcm-flexrm-mailbox: Use dma_pool_zalloc()
  3. 06 Apr, 2018 22 commits
    • Linus Torvalds's avatar
      Merge tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux · 9eda2d2d
      Linus Torvalds authored
      Pull SELinux updates from Paul Moore:
       "A bigger than usual pull request for SELinux, 13 patches (lucky!)
        along with a scary looking diffstat.
        Although if you look a bit closer, excluding the usual minor
        tweaks/fixes, there are really only two significant changes in this
        pull request: the addition of proper SELinux access controls for SCTP
        and the encapsulation of a lot of internal SELinux state.
        The SCTP changes are the result of a multi-month effort (maybe even a
        year or longer?) between the SELinux folks and the SCTP folks to add
        proper SELinux controls. A special thanks go to Richard for seeing
        this through and keeping the effort moving forward.
        The state encapsulation work is a bit of janitorial work that came out
        of some early work on SELinux namespacing. The question of namespacing
        is still an open one, but I believe there is some real value in the
        encapsulation work so we've split that out and are now sending that up
        to you"
      * tag 'selinux-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
        selinux: wrap AVC state
        selinux: wrap selinuxfs state
        selinux: fix handling of uninitialized selinux state in get_bools/classes
        selinux: Update SELinux SCTP documentation
        selinux: Fix ltp test connect-syscall failure
        selinux: rename the {is,set}_enforcing() functions
        selinux: wrap global selinux state
        selinux: fix typo in selinux_netlbl_sctp_sk_clone declaration
        selinux: Add SCTP support
        sctp: Add LSM hooks
        sctp: Add ip option support
        security: Add support for SCTP security hooks
        netlabel: If PF_INET6, check sk_buff ip header version
    • Linus Torvalds's avatar
      Merge tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit · 6ad11bdd
      Linus Torvalds authored
      Pull audit updates from Paul Moore:
       "We didn't have anything to send for v4.16, but we're back with a
        little more than usual for v4.17.
        Eleven patches in total, most fall into the small fix category, but
        there are three non-trivial changes worth calling out:
         - the audit entry filter is being removed after deprecating it for
           quite a while (years of no one really using it because it turns out
           to be not very practical)
         - created our own version of "__mutex_owner()" because the locking
           folks were upset we were using theirs
         - improved our handling of kernel command line parameters to make
           them more forgiving
         - we fixed auditing of symlink operations
        Everything passes the audit-testsuite and as of a few minutes ago it
        merges well with your tree"
      * tag 'audit-pr-20180403' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
        audit: add refused symlink to audit_names
        audit: remove path param from link denied function
        audit: link denied should not directly generate PATH record
        audit: make ANOM_LINK obey audit_enabled and audit_dummy_context
        audit: do not panic on invalid boot parameter
        audit: track the owner of the command mutex ourselves
        audit: return on memory error to avoid null pointer dereference
        audit: bail before bug check if audit disabled
        audit: deprecate the AUDIT_FILTER_ENTRY filter
        audit: session ID should not set arch quick field pointer
        audit: update bugtracker and source URIs
    • Linus Torvalds's avatar
      Merge tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 69824bcc
      Linus Torvalds authored
      Pull pstore updates from Kees Cook:
       "This cycle was almost entirely improvements to the pstore compression
        options, noted below:
         - Add lz4hc and 842 to pstore compression options (Geliang Tang)
         - Refactor to use crypto compression API (Geliang Tang)
         - Fix up Kconfig dependencies for compression (Arnd Bergmann)
         - Allow for run-time compression selection
         - Remove stack VLA usage"
      * tag 'pstore-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        pstore: fix crypto dependencies
        pstore: Use crypto compress API
        pstore/ram: Do not use stack VLA for parity workspace
        pstore: Select compression at runtime
        pstore: Avoid size casts for 842 compression
        pstore: Add lz4hc and 842 compression support
    • Linus Torvalds's avatar
      Merge branch 'akpm' (patches from Andrew) · 3b54765c
      Linus Torvalds authored
      Merge updates from Andrew Morton:
       - a few misc things
       - ocfs2 updates
       - the v9fs maintainers have been missing for a long time. I've taken
         over v9fs patch slinging.
       - most of MM
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (116 commits)
        mm,oom_reaper: check for MMF_OOM_SKIP before complaining
        mm/ksm: fix interaction with THP
        mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t
        headers: untangle kmemleak.h from mm.h
        include/linux/mmdebug.h: make VM_WARN* non-rvals
        mm/page_isolation.c: make start_isolate_page_range() fail if already isolated
        mm: change return type to vm_fault_t
        mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes
        mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory
        kernel/fork.c: detect early free of a live mm
        mm: make counting of list_lru_one::nr_items lockless
        mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static
        block_invalidatepage(): only release page if the full page was invalidated
        mm: kernel-doc: add missing parameter descriptions
        mm/swap.c: remove @cold parameter description for release_pages()
        mm/nommu: remove description of alloc_vm_area
        zram: drop max_zpage_size and use zs_huge_class_size()
        zsmalloc: introduce zs_huge_class_size()
        mm: fix races between swapoff and flush dcache
        fs/direct-io.c: minor cleanups in do_blockdev_direct_IO
    • Linus Torvalds's avatar
      Merge tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd · 3fd14cdc
      Linus Torvalds authored
      Pull MTD updates from Boris Brezillon:
       "MTD Core:
         - Remove support for asynchronous erase (not implemented by any of
           the existing drivers anyway)
         - Remove Cyrille from the list of SPI NOR and MTD maintainers
         - Fix kernel doc headers
         - Allow users to define the partitions parsers they want to test
           through a DT property (compatible of the partitions subnode)
         - Remove the bfin-async-flash driver (the only architecture using it
           has been removed)
         - Fix pagetest test
         - Add extra checks in mtd_erase()
         - Simplify the MTD partition creation logic and get rid of
        MTD Drivers:
         - Add endianness information to the physmap DT binding
         - Add Eon EN29LV400A IDs to JEDEC probe logic
         - Use %*ph where appropriate
        SPI NOR Drivers:
         - Make fsl-quaspi assign different names to MTD devices connected to
           the same QSPI controller
         - Remove an unneeded driver.bus assigned in the fsl-qspi driver
        NAND Core:
         - Prepare arrival of the SPI NAND subsystem by implementing a generic
           (interface-agnostic) layer to ease manipulation of NAND devices
         - Move onenand code base to the drivers/mtd/nand/ dir
         - Rework timing mode selection
         - Provide a generic way for NAND chip drivers to flag a specific
           GET/SET FEATURE operation as supported/unsupported
         - Stop embedding ONFI/JEDEC param page in nand_chip
        NAND Drivers:
         - Rework/cleanup of the mxc driver
         - Various cleanups in the vf610 driver
         - Migrate the fsmc and vf610 to ->exec_op()
         - Get rid of the pxa driver (replaced by marvell_nand)
         - Support ->setup_data_interface() in the GPMI driver
         - Fix probe error path in several drivers
         - Remove support for unused hw_syndrome mode in sunxi_nand
         - Various minor improvements"
      * tag 'mtd/for-4.17' of git://git.infradead.org/linux-mtd: (89 commits)
        dt-bindings: fsl-quadspi: Add the example of two SPI NOR
        mtd: fsl-quadspi: Distinguish the mtd device names
        mtd: nand: Fix some function description mismatches in core.c
        mtd: fsl-quadspi: Remove unneeded driver.bus assignment
        mtd: rawnand: marvell: Rename ->ecc_clk into ->core_clk
        mtd: rawnand: s3c2410: enhance the probe function error path
        mtd: rawnand: tango: fix probe function error path
        mtd: rawnand: sh_flctl: fix the probe function error path
        mtd: rawnand: omap2: fix the probe function error path
        mtd: rawnand: mxc: fix probe function error path
        mtd: rawnand: denali: fix probe function error path
        mtd: rawnand: davinci: fix probe function error path
        mtd: rawnand: cafe: fix probe function error path
        mtd: rawnand: brcmnand: fix probe function error path
        mtd: rawnand: sunxi: Stop supporting ECC_HW_SYNDROME mode
        mtd: rawnand: marvell: Fix clock resource by adding a register clock
        mtd: ftl: Use DIV_ROUND_UP()
        mtd: Fix some function description mismatches in mtdcore.c
        mtd: physmap_of: update struct map_info's swap as per map requirement
        dt-bindings: mtd-physmap: Add endianness supports
    • Linus Torvalds's avatar
      Merge tag 'for-4.17/dm-changes' of... · 83c7c18b
      Linus Torvalds authored
      Merge tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      Pull device mapper updates from Mike Snitzer:
       - DM core passthrough ioctl fix to retain reference to DM table, and
         that table's block devices, while issuing the ioctl to one of those
         block devices.
       - DM core passthrough ioctl fix to _not_ override the fmode_t used to
         issue the ioctl. Overriding by using the fmode_t that the block
         device was originally open with during DM table load is a liability.
       - Add DM core support for secure erase forwarding and update the DM
         linear and DM striped targets to support them.
       - A DM core 4.16 stable fix to allow abnormal IO (e.g. discard, write
         same, write zeroes) for targets that make use of the non-splitting IO
         variant (as is done for multipath or thinp when layered directly on
       - Allow DM targets to return a payload in response to a DM message that
         they are sent. This is useful for DM targets that would like to
         provide statistics data in response to DM messages.
       - Update DM bufio to support non-power-of-2 block sizes. Numerous other
         related changes prepare the DM bufio code for this support.
       - Fix DM crypt to use a bounded amount of memory across the entire
         system. This is to avoid OOM that can otherwise occur in response to
         certain pathological IO workloads (e.g. discarding a large DM crypt
       - Add a 'check_at_most_once' feature to the DM verity target to allow
         verity to be used on mobile devices that have very limited resources.
       - Fix the DM integrity target to fail early if a keyed algorithm (e.g.
         HMAC) is to be used but the key isn't set.
       - Add non-power-of-2 support to the DM unstripe target.
       - Eliminate the use of a Variable Length Array in the DM stripe target.
       - Update the DM log-writes target to record metadata (REQ_META flag).
       - DM raid fixes for its nosync status and some variable range issues.
      * tag 'for-4.17/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
        dm: remove fmode_t argument from .prepare_ioctl hook
        dm: hold DM table for duration of ioctl rather than use blkdev_get
        dm raid: fix parse_raid_params() variable range issue
        dm verity: make verity_for_io_block static
        dm verity: add 'check_at_most_once' option to only validate hashes once
        dm bufio: don't embed a bio in the dm_buffer structure
        dm bufio: support non-power-of-two block sizes
        dm bufio: use slab cache for dm_buffer structure allocations
        dm bufio: reorder fields in dm_buffer structure
        dm bufio: relax alignment constraint on slab cache
        dm bufio: remove code that merges slab caches
        dm bufio: get rid of slab cache name allocations
        dm bufio: move dm-bufio.h to include/linux/
        dm bufio: delete outdated comment
        dm: add support for secure erase forwarding
        dm: backfill abnormal IO support to non-splitting IO submission
        dm raid: fix nosync status
        dm mpath: use DM_MAPIO_SUBMITTED instead of magic number 0 in process_queued_bios()
        dm stripe: get rid of a Variable Length Array (VLA)
        dm log writes: record metadata flag for better flags record
    • Linus Torvalds's avatar
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9022ca6b
      Linus Torvalds authored
      Pull misc vfs updates from Al Viro:
       "Assorted stuff, including Christoph's I_DIRTY patches"
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs: move I_DIRTY_INODE to fs.h
        ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
        ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call
        gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls
        fs: fold open_check_o_direct into do_dentry_open
        vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents
        vfs: make sure struct filename->iname is word-aligned
        get rid of pointless includes of fs_struct.h
        [poll] annotate SAA6588_CMD_POLL users
    • Bjorn Helgaas's avatar
      Merge remote-tracking branch 'lorenzo/pci/cadence' into next · 5f764419
      Bjorn Helgaas authored
      * lorenzo/pci/cadence:
        MAINTAINERS: Add missing /drivers/pci/cadence directory entry
    • Tetsuo Handa's avatar
      mm,oom_reaper: check for MMF_OOM_SKIP before complaining · 97b1255c
      Tetsuo Handa authored
      I got "oom_reaper: unable to reap pid:" messages when the victim thread
      was blocked inside free_pgtables() (which occurred after returning from
      unmap_vmas() and setting MMF_OOM_SKIP).  We don't need to complain when
      exit_mmap() already set MMF_OOM_SKIP.
        Killed process 7558 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB
        oom_reaper: unable to reap pid:7558 (a.out)
        a.out           D13272  7558   6931 0x00100084
        Call Trace:
      Link: http://lkml.kernel.org/r/201803221946.DHG65638.VFJHFtOSQLOMOF@I-love.SAKURA.ne.jp
      Signed-off-by: default avatarTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Claudio Imbrenda's avatar
      mm/ksm: fix interaction with THP · 77da2ba0
      Claudio Imbrenda authored
      This patch fixes a corner case for KSM.  When two pages belong or
      belonged to the same transparent hugepage, and they should be merged,
      KSM fails to split the page, and therefore no merging happens.
      This bug can be reproduced by:
      * making sure ksm is running (in case disabling ksmtuned)
      * enabling transparent hugepages
      * allocating a THP-aligned 1-THP-sized buffer
        e.g. on amd64: posix_memalign(&p, 1<<21, 1<<21)
      * filling it with the same values
        e.g. memset(p, 42, 1<<21)
      * performing madvise to make it mergeable
        e.g. madvise(p, 1<<21, MADV_MERGEABLE)
      * waiting for KSM to perform a few scans
      The expected outcome is that the all the pages get merged (1 shared and
      the rest sharing); the actual outcome is that no pages get merged (1
      unshared and the rest volatile)
      The reason of this behaviour is that we increase the reference count
      once for both pages we want to merge, but if they belong to the same
      hugepage (or compound page), the reference counter used in both cases is
      the one of the head of the compound page.  This means that
      split_huge_page will find a value of the reference counter too high and
      will fail.
      This patch solves this problem by testing if the two pages to merge
      belong to the same hugepage when attempting to merge them.  If so, the
      hugepage is split safely.  This means that the hugepage is not split if
      not necessary.
      Link: http://lkml.kernel.org/r/1521548069-24758-1-git-send-email-imbrenda@linux.vnet.ibm.com
      Signed-off-by: default avatarClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Co-authored-by: default avatarGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Stefan Agner's avatar
      mm/memblock.c: cast constant ULLONG_MAX to phys_addr_t · 644d87dc
      Stefan Agner authored
      This fixes a warning shown when phys_addr_t is 32-bit int when compiling
      with clang:
        mm/memblock.c:927:15: warning: implicit conversion from 'unsigned long long'
              to 'phys_addr_t' (aka 'unsigned int') changes value from
              18446744073709551615 to 4294967295 [-Wconstant-conversion]
                                        r->base : ULLONG_MAX;
        ./include/linux/kernel.h:30:21: note: expanded from macro 'ULLONG_MAX'
        #define ULLONG_MAX      (~0ULL)
      Link: http://lkml.kernel.org/r/20180319005645.29051-1-stefan@agner.ch
      Signed-off-by: default avatarStefan Agner <stefan@agner.ch>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Randy Dunlap's avatar
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap authored
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
      Signed-off-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: default avatarIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Michal Hocko's avatar
      include/linux/mmdebug.h: make VM_WARN* non-rvals · 91241681
      Michal Hocko authored
      At present the construct
      	if (VM_WARN(...))
      will compile OK with CONFIG_DEBUG_VM=y and will fail with
      CONFIG_DEBUG_VM=n.  The reason is that VM_{WARN,BUG}* have always been
      special wrt.  {WARN/BUG}* and never generate any code when DEBUG_VM is
      disabled.  So we cannot really use it in conditionals.
      We considered changing things so that this construct works in both cases
      but that might cause unwanted code generation with CONFIG_DEBUG_VM=n.
      It is safer and simpler to make the build fail in both cases.
      [akpm@linux-foundation.org: changelog]
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Kravetz's avatar
      mm/page_isolation.c: make start_isolate_page_range() fail if already isolated · 2c7452a0
      Mike Kravetz authored
      start_isolate_page_range() is used to set the migrate type of a set of
      pageblocks to MIGRATE_ISOLATE while attempting to start a migration
      operation.  It assumes that only one thread is calling it for the
      specified range.  This routine is used by CMA, memory hotplug and
      gigantic huge pages.  Each of these users synchronize access to the
      range within their subsystem.  However, two subsystems (CMA and gigantic
      huge pages for example) could attempt operations on the same range.  If
      this happens, one thread may 'undo' the work another thread is doing.
      This can result in pageblocks being incorrectly left marked as
      MIGRATE_ISOLATE and therefore not available for page allocation.
      What is ideally needed is a way to synchronize access to a set of
      pageblocks that are undergoing isolation and migration.  The only thing
      we know about these pageblocks is that they are all in the same zone.  A
      per-node mutex is too coarse as we want to allow multiple operations on
      different ranges within the same zone concurrently.  Instead, we will
      use the migration type of the pageblocks themselves as a form of
      start_isolate_page_range sets the migration type on a set of page-
      blocks going in order from the one associated with the smallest pfn to
      the largest pfn.  The zone lock is acquired to check and set the
      migration type.  When going through the list of pageblocks check if
      MIGRATE_ISOLATE is already set.  If so, this indicates another thread is
      working on this pageblock.  We know exactly which pageblocks we set, so
      clean up by undo those and return -EBUSY.
      This allows start_isolate_page_range to serve as a synchronization
      mechanism and will allow for more general use of callers making use of
      these interfaces.  Update comments in alloc_contig_range to reflect this
      new functionality.
      Each CPU holds the associated zone lock to modify or examine the
      migration type of a pageblock.  And, it will only examine/update a
      single pageblock per lock acquire/release cycle.
      Link: http://lkml.kernel.org/r/20180309224731.16978-1-mike.kravetz@oracle.com
      Signed-off-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Souptick Joarder's avatar
      mm: change return type to vm_fault_t · 1c8f4220
      Souptick Joarder authored
      The plan for these patches is to introduce the typedef, initially just
      as documentation ("These functions should return a VM_FAULT_ status").
      We'll trickle the patches to individual drivers/filesystems in through
      the maintainers, as far as possible.  Then we'll change the typedef to
      an unsigned int and break the compilation of any unconverted
      vmf_insert_page(), vmf_insert_mixed() and vmf_insert_pfn() are three
      newly added functions.  The various drivers/filesystems where return
      value of fault(), huge_fault(), page_mkwrite() and pfn_mkwrite() get
      converted, will need them.  These functions will return correct
      VM_FAULT_ code based on err value.
      We've had bugs before where drivers returned -EFOO.  And we have this
      silly inefficiency where vm_insert_xxx() return an errno which (afaict)
      every driver then converts into a VM_FAULT code.  In many cases drivers
      failed to return correct VM_FAULT code value despite of vm_insert_xxx()
      fails.  We have indentified and clean up all those existing bugs and
      silly inefficiencies in driver/filesystems by adding these three new
      inline wrappers.  As mentioned above, we will trickle those patches to
      individual drivers/filesystems in through maintainers after these three
      wrapper functions are merged.
      Eventually we can convert vm_insert_xxx() into vmf_insert_xxx() and
      remove these inline wrappers, but these are a good intermediate step.
      Link: http://lkml.kernel.org/r/20180310162351.GA7422@jordon-HP-15-Notebook-PC
      Signed-off-by: default avatarSouptick Joarder <jrdr.linux@gmail.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • David Rientjes's avatar
      mm, oom: remove 3% bonus for CAP_SYS_ADMIN processes · d46078b2
      David Rientjes authored
      Since the 2.6 kernel, the oom killer has slightly biased away from
      CAP_SYS_ADMIN processes by discounting some of its memory usage in
      comparison to other processes.
      This has always been implicit and nothing exactly relies on the
      Gaurav notices that __task_cred() can dereference a potentially freed
      pointer if the task under consideration is exiting because a reference
      to the task_struct is not held.
      Remove the CAP_SYS_ADMIN bias so that all processes are treated equally.
      If any CAP_SYS_ADMIN process would like to be biased against, it is
      always allowed to adjust /proc/pid/oom_score_adj.
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803071548510.6996@chino.kir.corp.google.com
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Reported-by: default avatarGaurav Kohli <gkohli@codeaurora.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • David Rientjes's avatar
      mm, page_alloc: wakeup kcompactd even if kswapd cannot free more memory · 5ecd9d40
      David Rientjes authored
      Kswapd will not wakeup if per-zone watermarks are not failing or if too
      many previous attempts at background reclaim have failed.
      This can be true if there is a lot of free memory available.  For high-
      order allocations, kswapd is responsible for waking up kcompactd for
      background compaction.  If the zone is not below its watermarks or
      reclaim has recently failed (lots of free memory, nothing left to
      reclaim), kcompactd does not get woken up.
      When __GFP_DIRECT_RECLAIM is not allowed, allow kcompactd to still be
      woken up even if kswapd will not reclaim.  This allows high-order
      allocations, such as thp, to still trigger background compaction even
      when the zone has an abundance of free memory.
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1803111659420.209721@chino.kir.corp.google.com
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mark Rutland's avatar
      kernel/fork.c: detect early free of a live mm · 3eda69c9
      Mark Rutland authored
      KASAN splats indicate that in some cases we free a live mm, then
      continue to access it, with potentially disastrous results.  This is
      likely due to a mismatched mmdrop() somewhere in the kernel, but so far
      the culprit remains elusive.
      Let's have __mmdrop() verify that the mm isn't live for the current
      task, similar to the existing check for init_mm.  This way, we can catch
      this class of issue earlier, and without requiring KASAN.
      Currently, idle_task_exit() leaves active_mm stale after it switches to
      init_mm.  This isn't harmful, but will trigger the new assertions, so we
      must adjust idle_task_exit() to update active_mm.
      Link: http://lkml.kernel.org/r/20180312140103.19235-1-mark.rutland@arm.com
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Kirill Tkhai's avatar
      mm: make counting of list_lru_one::nr_items lockless · 0c7c1bed
      Kirill Tkhai authored
      During the reclaiming slab of a memcg, shrink_slab iterates over all
      registered shrinkers in the system, and tries to count and consume
      objects related to the cgroup.  In case of memory pressure, this behaves
      bad: I observe high system time and time spent in list_lru_count_one()
      for many processes on RHEL7 kernel.
      This patch makes list_lru_node::memcg_lrus rcu protected, that allows to
      skip taking spinlock in list_lru_count_one().
      Shakeel Butt with the patch observes significant perf graph change.  He
      Setup: running a fork-bomb in a memcg of 200MiB on a 8GiB and 4 vcpu
      VM and recording the trace with 'perf record -g -a'.
      The trace without the patch:
      +  34.19%     fb.sh  [kernel.kallsyms]  [k] queued_spin_lock_slowpath
      +  30.77%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_lock
      +   3.53%     fb.sh  [kernel.kallsyms]  [k] list_lru_count_one
      +   2.26%     fb.sh  [kernel.kallsyms]  [k] super_cache_count
      +   1.68%     fb.sh  [kernel.kallsyms]  [k] shrink_slab
      +   0.59%     fb.sh  [kernel.kallsyms]  [k] down_read_trylock
      +   0.48%     fb.sh  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
      +   0.38%     fb.sh  [kernel.kallsyms]  [k] shrink_node_memcg
      +   0.32%     fb.sh  [kernel.kallsyms]  [k] queue_work_on
      +   0.26%     fb.sh  [kernel.kallsyms]  [k] count_shadow_nodes
      With the patch:
      +   0.16%     swapper  [kernel.kallsyms]    [k] default_idle
      +   0.13%     oom_reaper  [kernel.kallsyms]    [k] mutex_spin_on_owner
      +   0.05%     perf  [kernel.kallsyms]    [k] copy_user_generic_string
      +   0.05%     init.real  [kernel.kallsyms]    [k] wait_consider_task
      +   0.05%     kworker/0:0  [kernel.kallsyms]    [k] finish_task_switch
      +   0.04%     kworker/2:1  [kernel.kallsyms]    [k] finish_task_switch
      +   0.04%     kworker/3:1  [kernel.kallsyms]    [k] finish_task_switch
      +   0.04%     kworker/1:0  [kernel.kallsyms]    [k] finish_task_switch
      +   0.03%     binary  [kernel.kallsyms]    [k] copy_page
      Thanks Shakeel for the testing.
      [ktkhai@virtuozzo.com: v2]
        Link: http://lkml.kernel.org/r/151203869520.3915.2587549826865799173.stgit@localhost.localdomain
      Link: http://lkml.kernel.org/r/150583358557.26700.8490036563698102569.stgit@localhost.localdomain
      Signed-off-by: default avatarKirill Tkhai <ktkhai@virtuozzo.com>
      Tested-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarVladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Colin Ian King's avatar
      mm/swap_state.c: make bool enable_vma_readahead and swap_vma_readahead() static · f5c754d6
      Colin Ian King authored
      The bool enable_vma_readahead and swap_vma_readahead() are local to the
      source and do not need to be in global scope, so make them static.
      Cleans up sparse warnings:
        mm/swap_state.c:41:6: warning: symbol 'enable_vma_readahead' was not declared. Should it be static?
        mm/swap_state.c:742:13: warning: symbol 'swap_vma_readahead' was not declared. Should it be static?
      Link: http://lkml.kernel.org/r/20180223164852.5159-1-colin.king@canonical.com
      Signed-off-by: default avatarColin Ian King <colin.king@canonical.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jeff Moyer's avatar
      block_invalidatepage(): only release page if the full page was invalidated · 3172485f
      Jeff Moyer authored
      Prior to commit d47992f8 ("mm: change invalidatepage prototype to
      accept length"), an offset of 0 meant that the full page was being
      invalidated.  After that commit, we need to instead check the length.
      Jan said:
      : The only possible issue is that try_to_release_page() was called more
      : often than necessary.  Otherwise the issue is harmless but still it's good
      : to have this fixed.
      Link: http://lkml.kernel.org/r/x49fu5rtnzs.fsf@segfault.boston.devel.redhat.com
      Fixes: d47992f8
       ("mm: change invalidatepage prototype to accept length")
      Signed-off-by: default avatarJeff Moyer <jmoyer@redhat.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Lukas Czerner <lczerner@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Mike Rapoport's avatar