1. 20 May, 2015 2 commits
  2. 05 May, 2015 3 commits
  3. 06 Feb, 2015 1 commit
  4. 16 Dec, 2014 7 commits
  5. 09 Oct, 2014 1 commit
    • Sagi Grimberg's avatar
      IB/mlx5, iser, isert: Add Signature API additions · 78eda2bb
      Sagi Grimberg authored
      
      
      Expose more signature setting parameters. We modify the signature API
      to allow usage of some new execution parameters relevant to data
      integrity feature.
      
      This patch modifies ib_sig_domain structure by:
      
      - Deprecate DIF type in signature API (operation will
        be determined by the parameters alone, no DIF type awareness)
      - Add APPTAG check bitmask (for input domain)
      - Add REFTAG remap (increment) flag for each domain
      - Add APPTAG/REFTAG escape options for each domain
      
      The mlx5 driver is modified to follow the new parameters in HW
      signature setup.
      
      At the moment the callers (iser/isert) hard-code new parameters (by
      DIF type). In the future, callers will retrieve them from the scsi
      command structure.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      78eda2bb
  6. 19 Sep, 2014 1 commit
    • Shawn Bohrer's avatar
      IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get · 87773dd5
      Shawn Bohrer authored
      
      
      In debugging an application that receives -ENOMEM from ib_reg_mr(), I
      found that ib_umem_get() can fail because the pinned_vm count has
      wrapped causing it to always be larger than the lock limit even with
      RLIMIT_MEMLOCK set to RLIM_INFINITY.
      
      The wrapping of pinned_vm occurs because the process that calls
      ib_reg_mr() will have its mm->pinned_vm count incremented.  Later a
      different process with a different mm_struct than the one that
      allocated the ib_umem struct ends up releasing it which results in
      decrementing the new processes mm->pinned_vm count past zero and
      wrapping.
      
      I'm not entirely sure what circumstances cause a different process to
      release the ib_umem than the one that allocated it but the kernel
      stack trace of the freeing process from my situation looks like the
      following:
      
          Call Trace:
           [<ffffffff814d64b1>] dump_stack+0x19/0x1b
           [<ffffffffa0b522a5>] ib_umem_release+0x1f5/0x200 [ib_core]
           [<ffffffffa0b90681>] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib]
           [<ffffffffa0b4d93c>] ib_destroy_qp+0x12c/0x170 [ib_core]
           [<ffffffffa0cc7129>] ib_uverbs_close+0x259/0x4e0 [ib_uverbs]
           [<ffffffff81141cba>] __fput+0xba/0x240
           [<ffffffff81141e4e>] ____fput+0xe/0x10
           [<ffffffff81060894>] task_work_run+0xc4/0xe0
           [<ffffffff810029e5>] do_notify_resume+0x95/0xa0
           [<ffffffff814e3dd0>] int_signal+0x12/0x17
      
      The following patch fixes the issue by storing the pid struct of the
      process that calls ib_umem_get() so that ib_umem_release and/or
      ib_umem_account() can properly decrement the pinned_vm count of the
      correct mm_struct.
      Signed-off-by: default avatarShawn Bohrer <sbohrer@rgmadvisors.com>
      Reviewed-by: default avatarShachar Raindel <raindel@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      87773dd5
  7. 11 Aug, 2014 2 commits
  8. 01 Aug, 2014 1 commit
  9. 10 Jun, 2014 1 commit
    • Tatyana Nikolova's avatar
      RDMA/core: Add support for iWARP Port Mapper user space service · 30dc5e63
      Tatyana Nikolova authored
      This patch adds iWARP Port Mapper (IWPM) Version 2 support.  The iWARP
      Port Mapper implementation is based on the port mapper specification
      section in the Sockets Direct Protocol paper -
      http://www.rdmaconsortium.org/home/draft-pinkerton-iwarp-sdp-v1.0.pdf
      
      Existing iWARP RDMA providers use the same IP address as the native
      TCP/IP stack when creating RDMA connections.  They need a mechanism to
      claim the TCP ports used for RDMA connections to prevent TCP port
      collisions when other host applications use TCP ports.  The iWARP Port
      Mapper provides a standard mechanism to accomplish this.  Without this
      service it is possible for RDMA application to bind/listen on the same
      port which is already being used by native TCP host application.  If
      that happens the incoming TCP connection data can be passed to the
      RDMA stack with error.
      
      The iWARP Port Mapper solution doesn't contain any changes to the
      existing network stack in the kernel space.  All the changes are
      contained with the infiniband tree and also in user space.
      
      The iWARP Port Mapper service is implemented as a user space daemon
      process.  Source for the IWPM service is located at
      http://git.openfabrics.org/git?p=~tnikolova/libiwpm-1.0.0/.git;a=summary
      
      
      
      The iWARP driver (port mapper client) sends to the IWPM service the
      local IP address and TCP port it has received from the RDMA
      application, when starting a connection.  The IWPM service performs a
      socket bind from user space to get an available TCP port, called a
      mapped port, and communicates it back to the client.  In that sense,
      the IWPM service is used to map the TCP port, which the RDMA
      application uses to any port available from the host TCP port
      space. The mapped ports are used in iWARP RDMA connections to avoid
      collisions with native TCP stack which is aware that these ports are
      taken. When an RDMA connection using a mapped port is terminated, the
      client notifies the IWPM service, which then releases the TCP port.
      
      The message exchange between the IWPM service and the iWARP drivers
      (between user space and kernel space) is implemented using netlink
      sockets.
      
      1) Netlink interface functions are added: ibnl_unicast() and
         ibnl_mulitcast() for sending netlink messages to user space
      
      2) The signature of the existing ibnl_put_msg() is changed to be more
         generic
      
      3) Two netlink clients are added: RDMA_NL_NES, RDMA_NL_C4IW
         corresponding to the two iWarp drivers - nes and cxgb4 which use
         the IWPM service
      
      4) Enums are added to enumerate the attributes in the netlink
         messages, which are exchanged between the user space IWPM service
         and the iWARP drivers
      Signed-off-by: default avatarTatyana Nikolova <tatyana.e.nikolova@intel.com>
      Signed-off-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: default avatarPJ Waskiewicz <pj.waskiewicz@solidfire.com>
      
      [ Fold in range checking fixes and nlh_next removal as suggested by Dan
        Carpenter and Steve Wise.  Fix sparse endianness in hash.  - Roland ]
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      30dc5e63
  10. 04 Jun, 2014 1 commit
    • Roland Dreier's avatar
      IB/core: Fix sparse warnings about redeclared functions · 8385fd84
      Roland Dreier authored
      
      
      Fix a few functions that are declared with __attribute_const__ in the
      ib_verbs.h header file but defined without it in verbs.c.  This gets rid
      of the following sparse warnings:
      
          drivers/infiniband/core/verbs.c:51:5: error: symbol 'ib_rate_to_mult' redeclared with different type (originally declared at include/rdma/ib_verbs.h:469) - different modifiers
          drivers/infiniband/core/verbs.c:68:14: error: symbol 'mult_to_ib_rate' redeclared with different type (originally declared at include/rdma/ib_verbs.h:607) - different modifiers
          drivers/infiniband/core/verbs.c:85:5: error: symbol 'ib_rate_to_mbps' redeclared with different type (originally declared at include/rdma/ib_verbs.h:476) - different modifiers
          drivers/infiniband/core/verbs.c:111:1: error: symbol 'rdma_node_get_transport' redeclared with different type (originally declared at include/rdma/ib_verbs.h:84) - different modifiers
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      8385fd84
  11. 02 Jun, 2014 1 commit
    • Or Gerlitz's avatar
      IB: Add a QP creation flag to use GFP_NOIO allocations · 09b93088
      Or Gerlitz authored
      
      
      This addresses a problem where NFS client writes over IPoIB connected
      mode may deadlock on memory allocation/writeback.
      
      The problem is not directly memory reclamation.  There is an indirect
      dependency between network filesystems writing back pages and
      ipoib_cm_tx_init() due to how a kworker is used.  Page reclaim cannot
      make forward progress until ipoib_cm_tx_init() succeeds and it is
      stuck in page reclaim itself waiting for network transmission.
      Ordinarily this situation may be avoided by having the caller use
      GFP_NOFS but ipoib_cm_tx_init() does not have that information.
      
      To address this, take a general approach and add a new QP creation
      flag that tells the low-level hardware driver to use GFP_NOIO for the
      memory allocations related to the new QP.
      
      Use the new flag in the ipoib connected mode path, and if the driver
      doesn't support it, re-issue the QP creation without the flag.
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      09b93088
  12. 01 Apr, 2014 2 commits
  13. 07 Mar, 2014 2 commits
    • Sagi Grimberg's avatar
      IB/core: Introduce signature verbs API · 1b01d335
      Sagi Grimberg authored
      
      
      Introduce a verbs interface for signature-related operations.  A
      signature handover operation configures the layouts of data and
      protection attributes both in memory and wire domains.
      
      Signature operations are:
      
      - INSERT:
        Generate and insert protection information when handing over
        data from input space to output space.
      - validate and STRIP:
        Validate protection information and remove it when handing over
        data from input space to output space.
      - validate and PASS:
        Validate protection information and pass it when handing over
        data from input space to output space.
      
      Once the signature handover opration is done, the HCA will offload
      data integrity generation/validation while performing the actual data
      transfer.
      
      Additions:
      
      1. HCA signature capabilities in device attributes
          Verbs provider supporting signature handover operations fills
          relevant fields in device attributes structure returned by
          ib_query_device.
      
      2. QP creation flag IB_QP_CREATE_SIGNATURE_EN
          Creating a QP that will carry signature handover operations may
          require some special preparations from the verbs provider.  So we
          add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the
          created QP may carry out signature handover operations.  Expose
          signature support to verbs layer (no support for now).
      
      3. New send work request IB_WR_REG_SIG_MR
          Signature handover work request. This WR will define the signature
          handover properties of the memory/wire domains as well as the
          domains layout. The purpose of this work request is to bind all
          the needed information for the signature operation:
      
          - data to be transferred:  wr->sg_list (ib_sge).
            * The raw data, pre-registered to a single MR (normally, before
              signature, this MR would have been used directly for the data
              transfer)
          - data protection guards: sig_handover.prot (ib_sge).
            * The data protection buffer, pre-registered to a single MR, which
              contains the data integrity guards of the raw data blocks.
              Note that it may not always exist, only in cases where the user is
              interested in storing protection guards in memory.
          - signature operation attributes: sig_handover.sig_attrs.
            * Tells the HCA how to validate/generate the protection information.
      
          Once the work request is executed, the memory region that will
          describe the signature transaction will be the sig_mr.  The
          application can now go ahead and send the sig_mr.rkey or use the
          sig_mr.lkey for data transfer.
      
      4. New Verb ib_check_mr_status
          check_mr_status verb checks the status of the memory region post
          transaction.  The first check that may be used is
          IB_MR_CHECK_SIG_STATUS, which will indicate if any signature
          errors are pending for a specific signature-enabled ib_mr.  This
          verb is a lightwight check and is allowed to be taken from
          interrupt context.  An application must call this verb after it is
          known that the actual data transfer has finished.
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      1b01d335
    • Sagi Grimberg's avatar
      IB/core: Introduce protected memory regions · 17cd3a2d
      Sagi Grimberg authored
      
      
      This commit introduces verbs for creating/destoying memory
      regions which will allow new types of memory key operations such
      as protected memory registration.
      
      Indirect memory registration is registering several (one
      of more) pre-registered memory regions in a specific layout.
      The Indirect region may potentialy describe several regions
      and some repitition format between them.
      
      Protected Memory registration is registering a memory region
      with various data integrity attributes which will describe protection
      schemes that will be handled by the HCA in an offloaded manner.
      These memory regions will be applicable for a new REG_SIG_MR
      work request introduced later in this patchset.
      
      In the future these routines may replace or implement current memory
      regions creation routines existing today:
      - ib_reg_user_mr
      - ib_alloc_fast_reg_mr
      - ib_get_dma_mr
      - ib_dereg_mr
      Signed-off-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      17cd3a2d
  14. 04 Mar, 2014 1 commit
  15. 13 Feb, 2014 1 commit
  16. 18 Jan, 2014 2 commits
  17. 14 Jan, 2014 4 commits
    • Matan Barak's avatar
      IB/core: Ethernet L2 attributes in verbs/cm structures · dd5f03be
      Matan Barak authored
      
      
      This patch add the support for Ethernet L2 attributes in the
      verbs/cm/cma structures.
      
      When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
      in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
      
      Thus, those attributes were added to the following structures:
      
      * ib_ah_attr - added dmac
      * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
      * ib_wc - added smac, vlan_id
      * ib_sa_path_rec - added smac, dmac, vlan_id
      * cm_av - added smac and vlan_id
      
      For the path record structure, extra care was taken to avoid the new
      fields when packing it into wire format, so we don't break the IB CM
      and SA wire protocol.
      
      On the active side, the CM fills. its internal structures from the
      path provided by the ULP.  We add there taking the ETH L2 attributes
      and placing them into the CM Address Handle (struct cm_av).
      
      On the passive side, the CM fills its internal structures from the WC
      associated with the REQ message.  We add there taking the ETH L2
      attributes from the WC.
      
      When the HW driver provides the required ETH L2 attributes in the WC,
      they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
      code checks for the presence of these flags, and in their absence does
      address resolution from the ib_init_ah_from_wc() helper function.
      
      ib_modify_qp_is_ok is also updated to consider the link layer. Some
      parameters are mandatory for Ethernet link layer, while they are
      irrelevant for IB.  Vendor drivers are modified to support the new
      function signature.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      dd5f03be
    • Matan Barak's avatar
      IB/core: Add support for IB L2 device-managed steering · 240ae00e
      Matan Barak authored
      
      
      This patch adds preliminary support for IB L2 device-managed steering,
      currently exposed only in the kernel.
      
      This flow spec can be used by low-level drivers that need to indicate
      the link layer type when creating device-managed flow rules.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      240ae00e
    • Matan Barak's avatar
      IB/core: Add flow steering support for IPoIB UD traffic · 90f1d1b4
      Matan Barak authored
      
      
      When creating an IPoIB UD QP, provide a hint to the low level driver
      that the QP should support flow-steering.  This means that privileged
      user space applications can steer TCP/IP IPoIB traffic from the
      network stack, in a similar manner done with Ethernet RAW_PACKET QPs.
      
      The hint is provided through new QP creation flag called NETIF_QP.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      90f1d1b4
    • Upinder Malhi's avatar
      IB/core: Add RDMA_TRANSPORT_USNIC_UDP · 248567f7
      Upinder Malhi authored
      
      
      Add RDMA_TRANSPORT_USNIC_UDP which will be used by usNIC.
      Signed-off-by: default avatarUpinder Malhi <umalhi@cisco.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      248567f7
  18. 16 Dec, 2013 1 commit
  19. 17 Nov, 2013 1 commit
    • Yann Droneaud's avatar
      IB/core: extended command: an improved infrastructure for uverbs commands · f21519b2
      Yann Droneaud authored
      Commit 400dbc96 ("IB/core: Infrastructure for extensible uverbs
      commands") added an infrastructure for extensible uverbs commands
      while later commit 436f2ad0 ("IB/core: Export ib_create/destroy_flow
      through uverbs") exported ib_create_flow()/ib_destroy_flow() functions
      using this new infrastructure.
      
      According to the commit 400dbc96, the purpose of this
      infrastructure is to support passing around provider (eg. hardware)
      specific buffers when userspace issue commands to the kernel, so that
      it would be possible to extend uverbs (eg. core) buffers independently
      from the provider buffers.
      
      But the new kernel command function prototypes were not modified to
      take advantage of this extension. This issue was exposed by Roland
      Dreier in a previous review[1].
      
      So the following patch is an attempt to a revised extensible command
      infrastructure.
      
      This improved extensible command infrastructure distinguish between
      core (eg. legacy)'s command/response buffers from provider
      (eg. hardware)'s command/response buffers: each extended command
      implementing function is given a struct ib_udata to hold core
      (eg. uverbs) input and output buffers, and another struct ib_udata to
      hold the hw (eg. provider) input and output buffers.
      
      Having those buffers identified separately make it easier to increase
      one buffer to support extension without having to add some code to
      guess the exact size of each command/response parts: This should make
      the extended functions more reliable.
      
      Additionally, instead of relying on command identifier being greater
      than IB_USER_VERBS_CMD_THRESHOLD, the proposed infrastructure rely on
      unused bits in command field: on the 32 bits provided by command
      field, only 6 bits are really needed to encode the identifier of
      commands currently supported by the kernel. (Even using only 6 bits
      leaves room for about 23 new commands).
      
      So this patch makes use of some high order bits in command field to
      store flags, leaving enough room for more command identifiers than one
      will ever need (eg. 256).
      
      The new flags are used to specify if the command should be processed
      as an extended one or a legacy one. While designing the new command
      format, care was taken to make usage of flags itself extensible.
      
      Using high order bits of the commands field ensure that newer
      libibverbs on older kernel will properly fail when trying to call
      extended commands. On the other hand, older libibverbs on newer kernel
      will never be able to issue calls to extended commands.
      
      The extended command header includes the optional response pointer so
      that output buffer length and output buffer pointer are located
      together in the command, allowing proper parameters checking. This
      should make implementing functions easier and safer.
      
      Additionally the extended header ensure 64bits alignment, while making
      all sizes multiple of 8 bytes, extending the maximum buffer size:
      
                                   legacy      extended
      
         Maximum command buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
        Maximum response buffer:  256KBytes   1024KBytes (512KBytes + 512KBytes)
      
      For the purpose of doing proper buffer size accounting, the headers
      size are no more taken in account in "in_words".
      
      One of the odds of the current extensible infrastructure, reading
      twice the "legacy" command header, is fixed by removing the "legacy"
      command header from the extended command header: they are processed as
      two different parts of the command: memory is read once and
      information are not duplicated: it's making clear that's an extended
      command scheme and not a different command scheme.
      
      The proposed scheme will format input (command) and output (response)
      buffers this way:
      
      - command:
      
        legacy header +
        extended header +
        command data (core + hw):
      
          +----------------------------------------+
          | flags     |   00      00    |  command |
          |        in_words    |   out_words       |
          +----------------------------------------+
          |                 response               |
          |                 response               |
          | provider_in_words | provider_out_words |
          |                 padding                |
          +----------------------------------------+
          |                                        |
          .              <uverbs input>            .
          .              (in_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .             <provider input>           .
          .          (provider_in_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      - response, if present:
      
          +----------------------------------------+
          |                                        |
          .          <uverbs output space>         .
          .             (out_words * 8)            .
          |                                        |
          +----------------------------------------+
          |                                        |
          .         <provider output space>        .
          .         (provider_out_words * 8)       .
          |                                        |
          +----------------------------------------+
      
      The overall design is to ensure that the extensible infrastructure is
      itself extensible while begin more reliable with more input and bound
      checking.
      
      Note:
      
      The unused field in the extended header would be perfect candidate to
      hold the command "comp_mask" (eg. bit field used to handle
      compatibility).  This was suggested by Roland Dreier in a previous
      review[2].  But "comp_mask" field is likely to be present in the uverb
      input and/or provider input, likewise for the response, as noted by
      Matan Barak[3], so it doesn't make sense to put "comp_mask" in the
      header.
      
      [1]:
      http://marc.info/?i=CAL1RGDWxmM17W2o_era24A-TTDeKyoL6u3NRu_=t_dhV_ZA9MA@mail.gmail.com
      
      [2]:
      http://marc.info/?i=CAL1RGDXJtrc849M6_XNZT5xO1+ybKtLWGq6yg6LhoSsKpsmkYA@mail.gmail.com
      
      [3]:
      http://marc.info/?i=525C1149.6000701@mellanox.com
      
      Signed-off-by: default avatarYann Droneaud <ydroneaud@opteya.com>
      Link: http://marc.info/?i=cover.1383773832.git.ydroneaud@opteya.com
      
      
      
      [ Convert "ret ? ret : 0" to the equivalent "ret".  - Roland ]
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      f21519b2
  20. 15 Nov, 2013 1 commit
  21. 09 Nov, 2013 1 commit
  22. 02 Sep, 2013 1 commit
  23. 28 Aug, 2013 2 commits
    • Hadar Hen Zion's avatar
      IB/core: Export ib_create/destroy_flow through uverbs · 436f2ad0
      Hadar Hen Zion authored
      
      
      Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
      support flow steering for user space applications.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      436f2ad0
    • Hadar Hen Zion's avatar
      IB/core: Add receive flow steering support · 319a441d
      Hadar Hen Zion authored
      
      
      The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
      QPs, which receive plain Ethernet packets, specifically packets that
      don't carry any QPN to be matched by the receiving side.  Applications
      using these QPs must be provided with a method to program some
      steering rule with the HW so packets arriving at the local port can be
      routed to them.
      
      This patch adds ib_create_flow(), which allow providing a flow
      specification for a QP.  When there's a match between the
      specification and a received packet, the packet is forwarded to that
      QP, in a the same way one uses ib_attach_multicast() for IB UD
      multicast handling.
      
      Flow specifications are provided as instances of struct ib_flow_spec_yyy,
      which describe L2, L3 and L4 headers.  Currently specs for Ethernet, IPv4,
      TCP and UDP are defined.  Flow specs are made of values and masks.
      
      The input to ib_create_flow() is a struct ib_flow_attr, which contains
      a few mandatory control elements and optional flow specs.
      
          struct ib_flow_attr {
                  enum ib_flow_attr_type type;
                  u16      size;
                  u16      priority;
                  u32      flags;
                  u8       num_of_specs;
                  u8       port;
                  /* Following are the optional layers according to user request
                   * struct ib_flow_spec_yyy
                   * struct ib_flow_spec_zzz
                   */
          };
      
      As these specs are eventually coming from user space, they are defined and
      used in a way which allows adding new spec types without kernel/user ABI
      change, just with a little API enhancement which defines the newly added spec.
      
      The flow spec structures are defined with TLV (Type-Length-Value)
      entries, which allows calling ib_create_flow() with a list of variable
      length of optional specs.
      
      For the actual processing of ib_flow_attr the driver uses the number
      of specs and the size mandatory fields along with the TLV nature of
      the specs.
      
      Steering rules processing order is according to the domain over which
      the rule is set and the rule priority.  All rules set by user space
      applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
      could be used by future IPoIB RFS and Ethetool flow-steering interface
      implementation.  Lower numerical value for the priority field means
      higher priority.
      
      The returned value from ib_create_flow() is a struct ib_flow, which
      contains a database pointer (handle) provided by the HW driver to be
      used when calling ib_destroy_flow().
      
      Applications that offload TCP/IP traffic can also be written over IB
      UD QPs.  The ib_create_flow() / ib_destroy_flow() API is designed to
      support UD QPs too.  A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
      to denote support for flow steering.
      
      The ib_flow_attr enum type supports usage of flow steering for promiscuous
      and sniffer purposes:
      
          IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
      
          IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
              all Ethernet traffic which isn't steered to any QP
      
          IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
      
          IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
      
      ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
      Signed-off-by: default avatarHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      319a441d