1. 21 Sep, 2008 1 commit
    • Alexander Schmidt's avatar
      IB/ehca: Generate flush status CQ entries · b9012e0a
      Alexander Schmidt authored
      When a QP goes into error state, it is required that CQ entries with a
      flush error status are delivered to the application for any
      outstanding work requests.  eHCA does not do this in hardware, so this
      patch adds software flush CQE generation to the ehca driver.
      Whenever a QP gets into error state, it is added to the QP error list
      of its respective CQ.  If the error QP list of a CQ is not empty,
      poll_cq() generates flush CQEs before polling the actual CQ.
      Signed-off-by: default avatarAlexander Schmidt <alexs@linux.vnet.ibm.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  2. 12 Aug, 2008 4 commits
  3. 04 Aug, 2008 1 commit
  4. 15 Jul, 2008 2 commits
    • Joachim Fenkes's avatar
    • Steve Wise's avatar
      RDMA/core: Add memory management extensions support · 00f7ec36
      Steve Wise authored
      This patch adds support for the IB "base memory management extension"
      (BMME) and the equivalent iWARP operations (which the iWARP verbs
      mandates all devices must implement).  The new operations are:
       - Allocate an ib_mr for use in fast register work requests.
       - Allocate/free a physical buffer lists for use in fast register work
         requests.  This allows device drivers to allocate this memory as
         needed for use in posting send requests (eg via dma_alloc_coherent).
       - New send queue work requests:
         * send with remote invalidate
         * fast register memory region
         * local invalidate memory region
         * RDMA read with invalidate local memory region (iWARP only)
      Consumer interface details:
       - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
         to indicate device support for these features.
       - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
         IB_WR_RDMA_READ_WITH_INV are added.
       - A new consumer API function, ib_alloc_mr() is added to allocate
         fast register memory regions.
       - New consumer API functions, ib_alloc_fast_reg_page_list() and
         ib_free_fast_reg_page_list() are added to allocate and free
         device-specific memory for fast registration page lists.
       - A new consumer API function, ib_update_fast_reg_key(), is added to
         allow the key portion of the R_Key and L_Key of a fast registration
         MR to be updated.  Consumers call this if desired before posting
         a IB_WR_FAST_REG_MR work request.
      Consumers can use this as follows:
       - MR is allocated with ib_alloc_mr().
       - Page list memory is allocated with ib_alloc_fast_reg_page_list().
       - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
       - MR made VALID and bound to a specific page list via
       - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
         ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
         invalidate operation.
       - MR is deallocated with ib_dereg_mr()
       - page lists dealloced via ib_free_fast_reg_page_list().
      Applications can allocate a fast register MR once, and then can
      repeatedly bind the MR to different physical block lists (PBLs) via
      posting work requests to a send queue (SQ).  For each outstanding
      MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
      allocated (the fast_reg_page_list is owned by the low-level driver
      from the consumer posting a work request until the request completes).
      Thus pipelining can be achieved while still allowing device-specific
      page_list processing.
      The 32-bit fast register memory key/STag is composed of a 24-bit index
      and an 8-bit key.  The application can change the key each time it
      fast registers thus allowing more control over the peer's use of the
      key/STag (ie it can effectively be changed each time the rkey is
      rebound to a page list).
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  5. 06 Jun, 2008 1 commit
  6. 23 Apr, 2008 2 commits
  7. 17 Apr, 2008 1 commit
    • Roland Dreier's avatar
      IB/core: Add support for "send with invalidate" work requests · 0f39cf3d
      Roland Dreier authored
      Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
      "send with invalidate" work request as defined in the iWARP verbs and
      the InfiniBand base memory management extensions.  Also put "imm_data"
      and a new "invalidate_rkey" member in a new "ex" union in struct
      ib_send_wr. The invalidate_rkey member can be used to pass in an
      R_Key/STag to be invalidated.  Add this new union to struct
      ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
      Fix up low-level drivers to deal with the change to struct ib_send_wr,
      and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
      since that code never does any send with immediate operations.
      Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
      the iWARP drivers currently in the tree set the bit.  The amso1100
      driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
      if passed in as part of userspace send requests (since it does not
      implement kernel bypass work request queueing).  Remove the flag from
      all existing drivers that set it until we know which ones are OK.
      The values chosen for the new flag is not consecutive to avoid clashing
      with flags defined in the XRC patches, which are not merged yet but
      which are already in use and are likely to be merged soon.
      This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  8. 05 Feb, 2008 1 commit
  9. 25 Jan, 2008 1 commit
  10. 10 Oct, 2007 1 commit
  11. 18 Jul, 2007 1 commit
  12. 10 Jul, 2007 4 commits
  13. 09 Jul, 2007 1 commit
  14. 07 May, 2007 1 commit
    • Roland Dreier's avatar
      IB: Return "maybe missed event" hint from ib_req_notify_cq() · ed23a727
      Roland Dreier authored
      The semantics defined by the InfiniBand specification say that
      completion events are only generated when a completions is added to a
      completion queue (CQ) after completion notification is requested.  In
      other words, this means that the following race is possible:
      	while (CQ is not empty)
      	// new completion is added after while loop is exited
      	// no event is generated for the existing completion
      To close this race, the IB spec recommends doing another poll of the
      CQ after requesting notification.
      However, it is not always possible to arrange code this way (for
      example, we have found that NAPI for IPoIB cannot poll after
      requesting notification).  Also, some hardware (eg Mellanox HCAs)
      actually will generate an event for completions added before the call
      to ib_req_notify_cq() -- which is allowed by the spec, since there's
      no way for any upper-layer consumer to know exactly when a completion
      was really added -- so the extra poll of the CQ is just a waste.
      Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
      ib_req_notify_cq() so that it can return a hint about whether the a
      completion may have been added before the request for notification.
      The return value of ib_req_notify_cq() is extended so:
      	 < 0	means an error occurred while requesting notification
      	== 0	means notification was requested successfully, and if
      		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
      		events were missed and it is safe to wait for another
      	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
      		passed in.  It means that the consumer must poll the
      		CQ again to make sure it is empty to avoid the race
      		described above.
      We add a flag to enable this behavior rather than turning it on
      unconditionally, because checking for missed events may incur
      significant overhead for some low-level drivers, and consumers that
      don't care about the results of this test shouldn't be forced to pay
      for the test.
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  15. 04 Feb, 2007 1 commit
    • Michael S. Tsirkin's avatar
      IB: Return qp pointer as part of ib_wc · 062dbb69
      Michael S. Tsirkin authored
      struct ib_wc currently only includes the local QP number: this matches
      the IB spec, but seems mostly useless. The following patch replaces
      this with the pointer to qp itself, and updates all low level drivers
      and all users.
      This has the following advantages:
      - Ability to get a per-qp context through wc->qp->qp_context
      - Existing drivers already have the qp pointer ready in poll cq, so
        this change actually saves a tiny bit (extra memory read) on data path
        (for ehca it would actually be expensive to find the QP pointer when
        polling a CQ, but ehca does not support SRQ so we can leave wc->qp as
        NULL for ehca)
      - Users that need the QP number can still get it through wc->qp->qp_num
      Use case:
      In IPoIB connected mode code, I have a common CQ shared by multiple
      QPs.  To track connection usage, I need a way to get at some per-QP
      context upon the completion, and I would like to avoid allocating
      context object per work request just to stick a QP pointer into it.
      With this code, I can just use wc->qp->qp_context.
      Signed-off-by: default avatarMichael S. Tsirkin <mst@mellanox.co.il>
      Signed-off-by: default avatarRoland Dreier <rolandd@cisco.com>
  16. 22 Sep, 2006 1 commit