1. 28 Aug, 2017 3 commits
  2. 06 Jul, 2017 1 commit
    • Sagi Grimberg's avatar
      nvme: split nvme_uninit_ctrl into stop and uninit · d09f2b45
      Sagi Grimberg authored
      Usually before we teardown the controller we want to:
      1. complete/cancel any ctrl inflight works
      2. remove ctrl namespaces (only for removal though, resets
         shouldn't remove any namespaces).
      
      but we do not want to destroy the controller device as
      we might use it for logging during the teardown stage.
      
      This patch adds nvme_start_ctrl() which queues inflight
      controller works (aen, ns scan, queue start and keep-alive
      if kato is set) and nvme_stop_ctrl() which cancels the works
      namespace removal is left to the callers to handle.
      
      Move nvme_uninit_ctrl after we are done with the
      controller device.
      Reviewed-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      d09f2b45
  3. 02 Jul, 2017 2 commits
  4. 28 Jun, 2017 4 commits
  5. 27 Jun, 2017 1 commit
    • Jens Axboe's avatar
      nvme: add support for streams and directives · f5d11840
      Jens Axboe authored
      This adds support for Directives in NVMe, particular for the Streams
      directive. Support for Directives is a new feature in NVMe 1.3. It
      allows a user to pass in information about where to store the data, so
      that it the device can do so most effiently. If an application is
      managing and writing data with different life times, mixing differently
      retentioned data onto the same locations on flash can cause write
      amplification to grow. This, in turn, will reduce performance and life
      time of the device.
      Reviewed-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f5d11840
  6. 16 Jun, 2017 1 commit
  7. 15 Jun, 2017 6 commits
  8. 13 Jun, 2017 1 commit
  9. 09 Jun, 2017 1 commit
  10. 26 May, 2017 2 commits
  11. 20 Apr, 2017 4 commits
  12. 08 Apr, 2017 1 commit
  13. 05 Apr, 2017 2 commits
  14. 04 Apr, 2017 1 commit
  15. 02 Mar, 2017 1 commit
    • Keith Busch's avatar
      nvme: Complete all stuck requests · 302ad8cc
      Keith Busch authored
      If the nvme driver is shutting down its controller, the drievr will not
      start the queues up again, preventing blk-mq's hot CPU notifier from
      making forward progress.
      
      To fix that, this patch starts a request_queue freeze when the driver
      resets a controller so no new requests may enter. The driver will wait
      for frozen after IO queues are restarted to ensure the queue reference
      can be reinitialized when nvme requests to unfreeze the queues.
      
      If the driver is doing a safe shutdown, the driver will wait for the
      controller to successfully complete all inflight requests so that we
      don't unnecessarily fail them. Once the controller has been disabled,
      the queues will be restarted to force remaining entered requests to end
      in failure so that blk-mq's hot cpu notifier may progress.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      302ad8cc
  16. 22 Feb, 2017 2 commits
    • Andy Lutomirski's avatar
      nvme: Enable autonomous power state transitions · c5552fde
      Andy Lutomirski authored
      NVMe devices can advertise multiple power states.  These states can
      be either "operational" (the device is fully functional but possibly
      slow) or "non-operational" (the device is asleep until woken up).
      Some devices can automatically enter a non-operational state when
      idle for a specified amount of time and then automatically wake back
      up when needed.
      
      The hardware configuration is a table.  For each state, an entry in
      the table indicates the next deeper non-operational state, if any,
      to autonomously transition to and the idle time required before
      transitioning.
      
      This patch teaches the driver to program APST so that each successive
      non-operational state will be entered after an idle time equal to 100%
      of the total latency (entry plus exit) associated with that state.
      The maximum acceptable latency is controlled using dev_pm_qos
      (e.g. power/pm_qos_latency_tolerance_us in sysfs); non-operational
      states with total latency greater than this value will not be used.
      As a special case, setting the latency tolerance to 0 will disable
      APST entirely.  On hardware without APST support, the sysfs file will
      not be exposed.
      
      The latency tolerance for newly-probed devices is set by the module
      parameter nvme_core.default_ps_max_latency_us.
      
      In theory, the device can expose "default" APST table, but this
      doesn't seem to function correctly on my device (Samsung 950), nor
      does it seem particularly useful.  There is also an optional
      mechanism by which a configuration can be "saved" so it will be
      automatically loaded on reset.  This can be configured from
      userspace, but it doesn't seem useful to support in the driver.
      
      On my laptop, enabling APST seems to save nearly 1W.
      
      The hardware tables can be decoded in userspace with nvme-cli.
      'nvme id-ctrl /dev/nvmeN' will show the power state table and
      'nvme get-feature -f 0x0c -H /dev/nvme0' will show the current APST
      configuration.
      
      This feature is quirked off on a known-buggy Samsung device.
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      c5552fde
    • Andy Lutomirski's avatar
      nvme: Add a quirk mechanism that uses identify_ctrl · bd4da3ab
      Andy Lutomirski authored
      Currently, all NVMe quirks are based on PCI IDs.  Add a mechanism to
      define quirks based on identify_ctrl's vendor id, model number,
      and/or firmware revision.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      bd4da3ab
  17. 17 Feb, 2017 2 commits
  18. 06 Feb, 2017 1 commit
  19. 31 Jan, 2017 1 commit
  20. 13 Jan, 2017 1 commit
  21. 21 Dec, 2016 1 commit
    • Keith Busch's avatar
      nvme: simplify stripe quirk · e6282aef
      Keith Busch authored
      Some OEMs believe they own the Identify Controller vendor specific
      region and will repurpose it with their own values. While not common,
      we can't rely on the PCI VID:DID to tell use how to decode the field
      we reserved for this as the stripe size so we need to do something else
      for the list of devices using this quirk.
      
      The field was supposed to allow flexibility on the device's back-end
      striping, but it turned out that never materialized; the chunk is always
      the same as MDTS in the products subscribing to this quirk, so this
      patch removes the stripe_size field and sets the chunk to the max hw
      transfer size for the devices using this quirk.
      Signed-off-by: default avatarKeith Busch <keith.busch@intel.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      e6282aef
  22. 09 Dec, 2016 1 commit
    • Christoph Hellwig's avatar
      block: improve handling of the magic discard payload · f9d03f96
      Christoph Hellwig authored
      Instead of allocating a single unused biovec for discard requests, send
      them down without any payload.  Instead we allow the driver to add a
      "special" payload using a biovec embedded into struct request (unioned
      over other fields never used while in the driver), and overloading
      the number of segments for this case.
      
      This has a couple of advantages:
      
       - we don't have to allocate the bio_vec
       - the amount of special casing for discard requests in the block
         layer is significantly reduced
       - using this same scheme for other request types is trivial,
         which will be important for implementing the new WRITE_ZEROES
         op on devices where it actually requires a payload (e.g. SCSI)
       - we can get rid of playing games with the request length, as
         we'll never touch it and completions will work just fine
       - it will allow us to support ranged discard operations in the
         future by merging non-contiguous discard bios into a single
         request
       - last but not least it removes a lot of code
      
      This patch is the common base for my WIP series for ranges discards and to
      remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES,
      so it would be good to get it in quickly.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f9d03f96