1. 09 Jul, 2015 4 commits
    • Tejun Heo's avatar
      blkcg: fix blkcg_policy_data allocation bug · 06b285bd
      Tejun Heo authored
      e48453c3
      
       ("block, cgroup: implement policy-specific per-blkcg
      data") updated per-blkcg policy data to be dynamically allocated.
      When a policy is registered, its policy data aren't created.  Instead,
      when the policy is activated on a queue, the policy data are allocated
      if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
      This is buggy.  Consider the following scenario.
      
      1. A blkcg is created.  No blkg's attached yet.
      
      2. The policy is registered.  No policy data is allocated.
      
      3. The policy is activated on a queue.  As the above blkcg doesn't
         have any blkg's, it won't allocate the matching blkcg_policy_data.
      
      4. An IO is issued from the blkcg and blkg is created and the blkcg
         still doesn't have the matching policy data allocated.
      
      With cfq-iosched, this leads to an oops.
      
      It also doesn't free policy data on policy unregistration assuming
      that freeing of all policy data on blkcg destruction should take care
      of it; however, this also is incorrect.
      
      1. A blkcg has policy data.
      
      2. The policy gets unregistered but the policy data remains.
      
      3. Another policy gets registered on the same slot.
      
      4. Later, the new policy tries to allocate policy data on the previous
         blkcg but the slot is already occupied and gets skipped.  The
         policy ends up operating on the policy data of the previous policy.
      
      There's no reason to manage blkcg_policy_data lazily.  The reason we
      do lazy allocation of blkg's is that the number of all possible blkg's
      is the product of cgroups and block devices which can reach a
      surprising level.  blkcg_policy_data is contrained by the number of
      cgroups and shouldn't be a problem.
      
      This patch makes blkcg_policy_data to be allocated for all existing
      blkcg's on policy registration and freed on unregistration and removes
      blkcg_policy_data handling from policy [de]activation paths.  This
      makes that blkcg_policy_data are created and removed with the policy
      they belong to and fixes the above described problems.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: e48453c3
      
       ("block, cgroup: implement policy-specific per-blkcg data")
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      06b285bd
    • Tejun Heo's avatar
      blkcg: implement all_blkcgs list · 7876f930
      Tejun Heo authored
      
      
      Add all_blkcgs list goes through blkcg->all_blkcgs_node and is
      protected by blkcg_pol_mutex.  This will be used to fix
      blkcg_policy_data allocation bug.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      7876f930
    • Tejun Heo's avatar
      blkcg: blkcg_css_alloc() should grab blkcg_pol_mutex while iterating blkcg_policy[] · 144232b3
      Tejun Heo authored
      
      
      An entry in blkcg_policy[] is stable while there are non-bypassing
      in-flight IOs on a request_queue which has the policy activated.  This
      is why most derefs of blkcg_policy[] don't need explicit locking;
      however, blkcg_css_alloc() isn't invoked from IO path and thus doesn't
      have this protection and may race policies being added and removed.
      
      Fix it by adding explicit blkcg_pol_mutex protection around
      blkcg_policy[] iteration in blkcg_css_alloc().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: e48453c3
      
       ("block, cgroup: implement policy-specific per-blkcg data")
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      144232b3
    • Tejun Heo's avatar
      blkcg: allow blkcg_pol_mutex to be grabbed from cgroup [file] methods · 838f13bf
      Tejun Heo authored
      
      
      blkcg_pol_mutex primarily protects the blkcg_policy array.  It also
      protects cgroup file type [un]registration during policy addition /
      removal.  This puts blkcg_pol_mutex outside cgroup internal
      synchronization and in turn makes it impossible to grab from blkcg's
      cgroup methods as that leads to cyclic dependency.
      
      Another problematic dependency arising from this is through cgroup
      interface file deactivation.  Removing a cftype requires removing all
      files of the type which in turn involves draining all on-going
      invocations of the file methods.  This means that an interface file
      implementation can't grab blkcg_pol_mutex as draining can lead to AA
      deadlock.
      
      blkcg_reset_stats() is already in this situation.  It currently
      trylocks blkcg_pol_mutex and then unwinds and retries the whole
      operation on failure, which is cumbersome at best.  It has a lengthy
      comment explaining how cgroup internal synchronization is involved and
      expected to be updated but as explained above this doesn't need cgroup
      internal locking to deadlock.  It's a self-contained AA deadlock.
      
      The described circular dependencies can be easily broken by moving
      cftype [un]registration out of blkcg_pol_mutex and protect them with
      an outer mutex.  This patch introduces blkcg_pol_register_mutex which
      wraps entire policy [un]registration including cftype operations and
      shrinks blkcg_pol_mutex critical section.  This also makes the trylock
      dancing in blkcg_reset_stats() unnecessary.  Removed.
      
      This patch is necessary for the following blkcg_policy_data allocation
      bug fixes.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      838f13bf
  2. 07 Jul, 2015 4 commits
    • Arianna Avanzini's avatar
      block/blk-cgroup.c: free per-blkcg data when freeing the blkcg · a322baad
      Arianna Avanzini authored
      
      
      Currently, per-blkcg data is freed each time a policy is deactivated,
      that is also upon scheduler switch. However, when switching from a
      scheduler implementing a policy which requires per-blkcg data to
      another one, that same policy might be active on other devices, and
      therefore those same per-blkcg data could be still in use.
      This commit lets per-blkcg data be freed when the blkcg is freed
      instead of on policy deactivation.
      Signed-off-by: default avatarArianna Avanzini <avanzini.arianna@gmail.com>
      Reported-and-tested-by: default avatarMichael Kaminsky <kaminsky@cs.cmu.edu>
      Fixes: e48453c3
      
       ("block, cgroup: implement policy-specific per-blkcg data")
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a322baad
    • Maninder Singh's avatar
      block: use FIELD_SIZEOF to calculate size of a field · 0762b23d
      Maninder Singh authored
      
      
      use FIELD_SIZEOF instead of open coding
      Signed-off-by: default avatarManinder Singh <maninder1.s@samsung.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0762b23d
    • Mike Snitzer's avatar
      bio integrity: do not assume bio_integrity_pool exists if bioset exists · bb8bd38b
      Mike Snitzer authored
      
      
      bio_integrity_alloc() and bio_integrity_free() assume that if a bio was
      allocated from a bioset that that bioset also had its bio_integrity_pool
      allocated using bioset_integrity_create().  This is a very bad
      assumption given that bioset_create() and bioset_integrity_create() are
      completely disjoint.  Not all callers of bioset_create() have been
      trained to also call bioset_integrity_create() -- and they may not care
      to be.
      
      Fix this by falling back to kmalloc'ing 'struct bio_integrity_payload'
      rather than force all bioset consumers to (wastefully) preallocate a
      bio_integrity_pool that they very likely won't actually need (given the
      niche nature of the current block integrity support).
      
      Otherwise, a NULL pointer "Kernel BUG" with a trace like the following
      will be observed (as seen on s390x using zfcp storage) because dm-io
      doesn't use bioset_integrity_create() when creating its bioset:
      
          [  791.643338] Call Trace:
          [  791.643339] ([<00000003df98b848>] 0x3df98b848)
          [  791.643341]  [<00000000002c5de8>] bio_integrity_alloc+0x48/0xf8
          [  791.643348]  [<00000000002c6486>] bio_integrity_prep+0xae/0x2f0
          [  791.643349]  [<0000000000371e38>] blk_queue_bio+0x1c8/0x3d8
          [  791.643355]  [<000000000036f8d0>] generic_make_request+0xc0/0x100
          [  791.643357]  [<000000000036f9b2>] submit_bio+0xa2/0x198
          [  791.643406]  [<000003ff801f9774>] dispatch_io+0x15c/0x3b0 [dm_mod]
          [  791.643419]  [<000003ff801f9b3e>] dm_io+0x176/0x2f0 [dm_mod]
          [  791.643423]  [<000003ff8074b28a>] do_reads+0x13a/0x1a8 [dm_mirror]
          [  791.643425]  [<000003ff8074b43a>] do_mirror+0x142/0x298 [dm_mirror]
          [  791.643428]  [<0000000000154fca>] process_one_work+0x18a/0x3f8
          [  791.643432]  [<000000000015598a>] worker_thread+0x132/0x3b0
          [  791.643435]  [<000000000015d49a>] kthread+0xd2/0xd8
          [  791.643438]  [<00000000005bc0ca>] kernel_thread_starter+0x6/0xc
          [  791.643446]  [<00000000005bc0c4>] kernel_thread_starter+0x0/0xc
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      bb8bd38b
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c7e9ad7d
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
      
       - fix the perf build, by fixing the rbtree.c sharing bug between kernel
         and tools/perf by creating a local copy of rbtree.c (more will be
         done for v4.3)
      
       - fix an AUX buffer (Intel-PT support) refcounting bug
      
       - fix copy_from_user_nmi() return value"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/x86: Fix copy_from_user_nmi() return if range is not ok
        perf: Fix AUX buffer refcounting
        tools: Copy rbtree_augmented.h from the kernel
        tools: Move rbtree.h from tools/perf/
        tools: Copy lib/rbtree.c to tools/lib/
        perf tools: Copy rbtree.h from the kernel
        tools: Adopt {READ,WRITE_ONCE} from the kernel
      c7e9ad7d
  3. 06 Jul, 2015 6 commits
  4. 05 Jul, 2015 8 commits
    • Linus Torvalds's avatar
      Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 1c4c7159
      Linus Torvalds authored
      Pull ext4 bugfixes from Ted Ts'o:
       "Bug fixes (all for stable kernels) for ext4:
      
         - address corner cases for indirect blocks->extent migration
      
         - fix reserved block accounting invalidate_page when
           page_size != block_size (i.e., ppc or 1k block size file systems)
      
         - fix deadlocks when a memcg is under heavy memory pressure
      
         - fix fencepost error in lazytime optimization"
      
      * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: replace open coded nofail allocation in ext4_free_blocks()
        ext4: correctly migrate a file with a hole at the beginning
        ext4: be more strict when migrating to non-extent based file
        ext4: fix reservation release on invalidatepage for delalloc fs
        ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp
        bufferhead: Add _gfp version for sb_getblk()
        ext4: fix fencepost error in lazytime optimization
      1c4c7159
    • Arnaldo Carvalho de Melo's avatar
      perf tools: Copy rbtree.h from the kernel · 4407f967
      Arnaldo Carvalho de Melo authored
      We were using the include/linux/rbtree.h directly from the kernel,
      which broke the build as soon as it started using rcupdate.h, to
      avoid dragging the rcu header files into tools/, for which there is
      no use so far, grab a copy of rbtree.h.
      
      This is the minimal fix, later patches will copy as well lib/rbtree.c
      and move rbtree.h into tools/include/, etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-dfmuj0j63w4by7vhlh4hhn74@git.kernel.org
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      4407f967
    • Arnaldo Carvalho de Melo's avatar
      tools: Adopt {READ,WRITE_ONCE} from the kernel · 728abda6
      Arnaldo Carvalho de Melo authored
      We need it to build rbtree.c after this cset:
      
        commit d72da4a4
        Author: Peter Zijlstra <peterz@infradead.org>
        Date:   Wed May 27 11:09:36 2015 +0930
      
          rbtree: Make lockless searches non-fatal
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-qlnzhezv5ddwst0w9fydju0y@git.kernel.org
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      728abda6
    • Linus Torvalds's avatar
      Linux 4.2-rc1 · d770e558
      Linus Torvalds authored
      d770e558
    • Linus Torvalds's avatar
      Merge tag 'platform-drivers-x86-v4.2-2' of... · a585d2b7
      Linus Torvalds authored
      Merge tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
      
      Pull late x86 platform driver updates from Darren Hart:
       "The following came in a bit later and I wanted them to bake in next a
        few more days before submitting, thus the second pull.
      
        A new intel_pmc_ipc driver, a symmetrical allocation and free fix in
        dell-laptop, a couple minor fixes, and some updated documentation in
        the dell-laptop comments.
      
        intel_pmc_ipc:
         - Add Intel Apollo Lake PMC IPC driver
      
        tc1100-wmi:
         - Delete an unnecessary check before the function call "kfree"
      
        dell-laptop:
         - Fix allocating & freeing SMI buffer page
         - Show info about WiGig and UWB in debugfs
         - Update information about wireless control"
      
      * tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
        intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver
        tc1100-wmi: Delete an unnecessary check before the function call "kfree"
        dell-laptop: Fix allocating & freeing SMI buffer page
        dell-laptop: Show info about WiGig and UWB in debugfs
        dell-laptop: Update information about wireless control
      a585d2b7
    • Michal Hocko's avatar
      ext4: replace open coded nofail allocation in ext4_free_blocks() · 7444a072
      Michal Hocko authored
      
      
      ext4_free_blocks is looping around the allocation request and mimics
      __GFP_NOFAIL behavior without any allocation fallback strategy. Let's
      remove the open coded loop and replace it with __GFP_NOFAIL. Without the
      flag the allocator has no way to find out never-fail requirement and
      cannot help in any way.
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.cz>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      7444a072
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1dc51b82
      Linus Torvalds authored
      Pull more vfs updates from Al Viro:
       "Assorted VFS fixes and related cleanups (IMO the most interesting in
        that part are f_path-related things and Eric's descriptor-related
        stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
        fs-cache series, DAX patches, Jan's file_remove_suid() work"
      
      [ I'd say this is much more than "fixes and related cleanups".  The
        file_table locking rule change by Eric Dumazet is a rather big and
        fundamental update even if the patch isn't huge.   - Linus ]
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
        9p: cope with bogus responses from server in p9_client_{read,write}
        p9_client_write(): avoid double p9_free_req()
        9p: forgetting to cancel request on interrupted zero-copy RPC
        dax: bdev_direct_access() may sleep
        block: Add support for DAX reads/writes to block devices
        dax: Use copy_from_iter_nocache
        dax: Add block size note ...
      1dc51b82
    • Linus Torvalds's avatar
      bluetooth: fix list handling · 9b284cbd
      Linus Torvalds authored
      Commit 835a6a2f
      
       ("Bluetooth: Stop sabotaging list poisoning")
      thought that the code was sabotaging the list poisoning when NULL'ing
      out the list pointers and removed it.
      
      But what was going on was that the bluetooth code was using NULL
      pointers for the list as a way to mark it empty, and that commit just
      broke it (and replaced the test with NULL with a "list_empty()" test on
      a uninitialized list instead, breaking things even further).
      
      So fix it all up to use the regular and real list_empty() handling
      (which does not use NULL, but a pointer to itself), also making sure to
      initialize the list properly (the previous NULL case was initialized
      implicitly by the session being allocated with kzalloc())
      
      This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
      An.
      
      [ I would normally expect to get this through the bt tree, but I'm going
        to release -rc1, so I'm just committing this directly   - Linus ]
      Reported-and-tested-by: default avatarJörg Otte <jrg.otte@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Original-by: default avatarTedd Ho-Jeong An <tedd.an@intel.com>
      Original-by: Marcel Holtmann <marcel@holtmann.org>:
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9b284cbd
  5. 04 Jul, 2015 18 commits
    • Linus Torvalds's avatar
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · 5c755fe1
      Linus Torvalds authored
      Pull SCSI target updates from Nicholas Bellinger:
       "It's been a busy development cycle for target-core in a number of
        different areas.
      
        The fabric API usage for se_node_acl allocation is now within
        target-core code, dropping the external API callers for all fabric
        drivers tree-wide.
      
        There is a new conversion to RCU hlists for se_node_acl and
        se_portal_group LUN mappings, that turns fast-past LUN lookup into a
        completely lockless code-path.  It also removes the original
        hard-coded limitation of 256 LUNs per fabric endpoint.
      
        The configfs attributes for backends can now be shared between core
        and driver code, allowing existing drivers to use common code while
        still allowing flexibility for new backend provided attributes.
      
        The highlights include:
      
         - Merge sbc_verify_dif_* into common code (sagi)
         - Remove iscsi-target support for obsolete IFMarker/OFMarker
           (Christophe Vu-Brugier)
         - Add bidi support in target/user backend (ilias + vangelis + agover)
         - Move se_node_acl allocation into target-core code (hch)
         - Add crc_t10dif_update common helper (akinobu + mkp)
         - Handle target-core odd SGL mapping for data transfer memory
           (akinobu)
         - Move transport ID handling into target-core (hch)
         - Move task tag into struct se_cmd + support 64-bit tags (bart)
         - Convert se_node_acl->device_list[] to RCU hlist (nab + hch +
           paulmck)
         - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch +
           paulmck)
         - Simplify target backend driver registration (hch)
         - Consolidate + simplify target backend attribute implementations
           (hch + nab)
         - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch)
         - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab)
         - Drop unnecessary core_tpg_register TFO parameter (nab)
         - Use 64-bit LUNs tree-wide (hannes)
         - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits)
        target: Bump core version to v5.0
        target: remove target_core_configfs.h
        target: remove unused TARGET_CORE_CONFIG_ROOT define
        target: consolidate version defines
        target: implement WRITE_SAME with UNMAP bit using ->execute_unmap
        target: simplify UNMAP handling
        target: replace se_cmd->execute_rw with a protocol_data field
        target/user: Fix inconsistent kmap_atomic/kunmap_atomic
        target: Send UA when changing LUN inventory
        target: Send UA upon LUN RESET tmr completion
        target: Send UA on ALUA target port group change
        target: Convert se_lun->lun_deve_lock to normal spinlock
        target: use 'se_dev_entry' when allocating UAs
        target: Remove 'ua_nacl' pointer from se_ua structure
        target_core_alua: Correct UA handling when switching states
        xen-scsiback: Fix compile warning for 64-bit LUN
        target: Remove TARGET_MAX_LUNS_PER_TRANSPORT
        target: use 64-bit LUNs
        target: Drop duplicate + unused se_dev_check_wce
        target: Drop unnecessary core_tpg_register TFO parameter
        ...
      5c755fe1
    • Linus Torvalds's avatar
      Merge tag 'ntb-4.2' of git://github.com/jonmason/ntb · 6d7c8e1b
      Linus Torvalds authored
      Pull NTB updates from Jon Mason:
       "This includes a pretty significant reworking of the NTB core code, but
        has already produced some significant performance improvements.
      
        An abstraction layer was added to allow the hardware and clients to be
        easily added.  This required rewriting the NTB transport layer for
        this abstraction layer.  This modification will allow future "high
        performance" NTB clients.
      
        In addition to this change, a number of performance modifications were
        added.  These changes include NUMA enablement, using CPU memcpy
        instead of asyncdma, and modification of NTB layer MTU size"
      
      * tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits)
        NTB: Add split BAR output for debugfs stats
        NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
        NTB: Print driver name and version in module init
        NTB: Increase transport MTU to 64k from 16k
        NTB: Rename Intel code names to platform names
        NTB: Default to CPU memcpy for performance
        NTB: Improve performance with write combining
        NTB: Use NUMA memory in Intel driver
        NTB: Use NUMA memory and DMA chan in transport
        NTB: Rate limit ntb_qp_link_work
        NTB: Add tool test client
        NTB: Add ping pong test client
        NTB: Add parameters for Intel SNB B2B addresses
        NTB: Reset transport QP link stats on down
        NTB: Do not advance transport RX on link down
        NTB: Differentiate transport link down messages
        NTB: Check the device ID to set errata flags
        NTB: Enable link for Intel root port mode in probe
        NTB: Read peer info from local SPAD in transport
        NTB: Split ntb_hw_intel and ntb_transport drivers
        ...
      6d7c8e1b
    • Al Viro's avatar
      9p: cope with bogus responses from server in p9_client_{read,write} · 0f1db7de
      Al Viro authored
      if server claims to have written/read more than we'd told it to,
      warn and cap the claimed byte count to avoid advancing more than
      we are ready to.
      0f1db7de
    • Al Viro's avatar
      p9_client_write(): avoid double p9_free_req() · 67e808fb
      Al Viro authored
      
      
      Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
      if response is impossible to parse and we discard the request, get the
      out of the loop right there.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      67e808fb
    • Al Viro's avatar
      9p: forgetting to cancel request on interrupted zero-copy RPC · a84b69cb
      Al Viro authored
      
      
      If we'd already sent a request and decide to abort it, we *must*
      issue TFLUSH properly and not just blindly reuse the tag, or
      we'll get seriously screwed when response eventually arrives
      and we confuse it for response to later request that had reused
      the same tag.
      
      Cc: stable@vger.kernel.org # v3.2 and later
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      a84b69cb
    • Matthew Wilcox's avatar
      dax: bdev_direct_access() may sleep · 43c3dd08
      Matthew Wilcox authored
      
      
      The brd driver is the only in-tree driver that may sleep currently.
      After some discussion on linux-fsdevel, we decided that any driver
      may choose to sleep in its ->direct_access method.  To ensure that all
      callers of bdev_direct_access() are prepared for this, add a call
      to might_sleep().
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      43c3dd08
    • Matthew Wilcox's avatar
      block: Add support for DAX reads/writes to block devices · bbab37dd
      Matthew Wilcox authored
      
      
      If a block device supports the ->direct_access methods, bypass the normal
      DIO path and use DAX to go straight to memcpy() instead of allocating
      a DIO and a BIO.
      
      Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in
      do_blockdev_direct_IO().
      Signed-off-by: default avatarMatthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      bbab37dd
    • Matthew Wilcox's avatar
      dax: Use copy_from_iter_nocache · 872eb127
      Matthew Wilcox authored
      
      
      When userspace does a write, there's no need for the written data to
      pollute the CPU cache.  This matches the original XIP code.
      Signed-off-by: default avatarMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      872eb127
    • Matthew Wilcox's avatar
      dax: Add block size note to documentation · 44f4c054
      Matthew Wilcox authored
      
      
      For block devices which are small enough, mkfs will default to creating
      a filesystem with block sizes smaller than page size.
      Signed-off-by: default avatarMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      44f4c054
    • Linus Torvalds's avatar
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 1b3618b6
      Linus Torvalds authored
      Pull kvm fixes from Paolo Bonzini:
       "Except for the preempt notifiers fix, these are all small bugfixes
        that could have been waited for -rc2.  Sending them now since I was
        taking care of Peter's patch anyway"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        kvm: add hyper-v crash msrs values
        KVM: x86: remove data variable from kvm_get_msr_common
        KVM: s390: virtio-ccw: don't overwrite config space values
        KVM: x86: keep track of LVT0 changes under APICv
        KVM: x86: properly restore LVT0
        KVM: x86: make vapics_in_nmi_mode atomic
        sched, preempt_notifier: separate notifier registration from static_key inc/dec
      1b3618b6
    • Dave Jiang's avatar
      NTB: Add split BAR output for debugfs stats · bf44fe46
      Dave Jiang authored
      
      
      When split BAR is enabled, the driver needs to dump out the split BAR
      registers rather than the original 64bit BAR registers.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      bf44fe46
    • Dave Jiang's avatar
      NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe · fd839bf8
      Dave Jiang authored
      
      
      The unsafe doorbell and scratchpad access should display reason when
      WARN is called.  Otherwise we get a stack dump without any explanation.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      fd839bf8
    • Dave Jiang's avatar
      NTB: Print driver name and version in module init · 7eb38781
      Dave Jiang authored
      
      
      Printouts driver name and version to indicate what is being loaded.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      7eb38781
    • Dave Jiang's avatar
      NTB: Increase transport MTU to 64k from 16k · 9891417d
      Dave Jiang authored
      
      
      Benchmarking showed a significant performance increase with the MTU size
      to 64k instead of 16k.  Change the driver default to 64k.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      9891417d
    • Dave Jiang's avatar
      NTB: Rename Intel code names to platform names · 2f887b9a
      Dave Jiang authored
      
      
      Instead of using the platform code names, use the correct platform names
      to identify the respective Intel NTB hardware.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      2f887b9a
    • Dave Jiang's avatar
      NTB: Default to CPU memcpy for performance · a41ef053
      Dave Jiang authored
      
      
      Disable DMA usage by default, since the CPU provides much better
      performance with write combining.  Provide a module parameter to enable
      DMA usage when offloading the memcpy is preferred.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarAllen Hubbe <Allen.Hubbe@emc.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      a41ef053
    • Dave Jiang's avatar
      NTB: Improve performance with write combining · 06917f75
      Dave Jiang authored
      
      
      Changing the memory window BAR mappings to write combining significantly
      boosts the performance.  We will also use memcpy that uses non-temporal
      store, which showed performance improvement when doing non-cached
      memcpys.
      Signed-off-by: default avatarDave Jiang <dave.jiang@intel.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      06917f75
    • Allen Hubbe's avatar
      NTB: Use NUMA memory in Intel driver · 0e041fb5
      Allen Hubbe authored
      
      
      Allocate memory for the NUMA node of the NTB device.
      Signed-off-by: default avatarAllen Hubbe <Allen.Hubbe@emc.com>
      Signed-off-by: default avatarJon Mason <jdmason@kudzu.us>
      0e041fb5