- 05 Sep, 2018 1 commit
-
-
Sagi Grimberg authored
Currently we always repost the recv buffer before we send a response capsule back to the host. Since ordering is not guaranteed for send and recv completions, it is posible that we will receive a new request from the host before we got a send completion for the response capsule. Today, we pre-allocate 2x rsps the length of the queue, but in reality, under heavy load there is nothing that is really preventing the gap to expand until we exhaust all our rsps. To fix this, if we don't have any pre-allocated rsps left, we dynamically allocate a rsp and make sure to free it when we are done. If under memory pressure we fail to allocate a rsp, we silently drop the command and wait for the host to retry. Reported-by:
Steve Wise <swise@opengridcomputing.com> Tested-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> [hch: dropped a superflous assignment] Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- 24 Jul, 2018 1 commit
-
-
Bart Van Assche authored
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL as third argument to ib_post_(send|recv|srq_recv)(). Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- 23 Jul, 2018 3 commits
-
-
Max Gurtovoy authored
Posting receive buffer operation can fail, thus we should make sure to have an error flow during initialization phase. While we're here, add a debug print in case of a failure. Signed-off-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Max Gurtovoy authored
ib_post_send operation should succeed unless something unusual happened to the ib device. Signed-off-by:
Max Gurtovoy <maxg@mellanox.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Steve Wise authored
The patch enables inline data sizes using up to 4 recv sges, and capping the size at 16KB or at least 1 page size. So on a 4K page system, up to 16KB is supported, and for a 64K page system 1 page of 64KB is supported. We avoid > 0 order page allocations for the inline buffers by using multiple recv sges, one for each page. If the device cannot support the configured inline data size due to lack of enough recv sges, then log a warning and reduce the inline size. Add a new configfs port attribute, called param_inline_data_size, to allow configuring the size of inline data for a given nvmf port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes for small page systems. If the configured inline data size exceeds NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is reduced. If param_inline_data_size is set to 0, then inline data is disabled for that nvmf port. Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- 18 Jun, 2018 1 commit
-
-
Steve Wise authored
This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by:
Sagi Grimberg <sagi@grimberg.me> Acked-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Selvin Xavier <selvin.xavier@broadcom.com> Acked-by:
Shiraz Saleem <shiraz.saleem@intel.com> Acked-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- 26 Mar, 2018 5 commits
-
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Max Gurtovoy authored
The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by:
Israel Rukshin <israelr@mellanox.com> Signed-off-by:
Max Gurtovoy <maxg@mellanox.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Israel Rukshin authored
We free nvmet rdma queues while handling rdma_cm events. In order to avoid this we destroy the qp and the queue after destroying the cm_id which guarantees that all rdma_cm events are done. Signed-off-by:
Israel Rukshin <israelr@mellanox.com> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Israel Rukshin authored
Signed-off-by:
Israel Rukshin <israelr@mellanox.com> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
Its perfectly valid to assign a nvmet port to listen on "any" IP address (traddr 0.0.0.0 for ipv4 address family) for IP based transport ports. However, we must not return this address in discovery log entries. Instead we need to return the address where the request was accepted on (req->port address). Since this is nvme transport specific, introduce an optional .disc_traddr interface that is designed to check that a port in question is bound to "any" IP address and if so, set the traddr from the port where the request came from. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 08 Jan, 2018 2 commits
-
-
Sagi Grimberg authored
It is a bit chatty to report on every deleted queue, so keep it for debug purposes only. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
We already do that when we are notified in device removal which is triggered when unregistering as an ib client. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
- 06 Jan, 2018 1 commit
-
-
Bart Van Assche authored
Use the sgl_alloc() and sgl_free() functions instead of open coding these functions. Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Hannes Reinecke <hare@suse.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 11 Nov, 2017 2 commits
-
-
Christoph Hellwig authored
Currently the NVMe target stores the expexted data length in req->data_len and uses that for data transfer decisions, but that does not take the actual transfer length in the SGLs into account. So this adds a new transfer_len field, into which the transport drivers store the actual transfer length. We then check the two match before actually executing the command. The FC transport driver already had such a field, which is removed in favour of the common one. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Israel Rukshin authored
A NULL deref happens when nvmet_rdma_remove_one() is called more than once (e.g. while connected via 2 ports). The first call frees the queues related to the first ib_device but doesn't remove them from the queue list. While calling nvmet_rdma_remove_one() for the second ib_device it goes over the full queue list again and we get the NULL deref. Fixes: f1d4ef7d ("nvmet-rdma: register ib_client to not deadlock in device removal") Signed-off-by:
Israel Rukshin <israelr@mellanox.com> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Reviewed-by:
Sagi Grimberg <sagi@grmberg.me> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 18 Aug, 2017 1 commit
-
-
Sagi Grimberg authored
Now that its not needed, we can simply not assign it. Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 28 Jun, 2017 2 commits
-
-
Sagi Grimberg authored
We can deadlock in case we got to a device removal event on a queue which is already in the process of destroying the cm_id is this is blocking until all events on this cm_id will drain. On the other hand we cannot guarantee that rdma_destroy_id was invoked as we only have indication that the queue disconnect flow has been queued (the queue state is updated before the realease work has been queued). So, we leave all the queue removal to a separate ib_client to avoid this deadlock as ib_client device removal is in a different context than the cm_id itself. Reported-by:
Shiraz Saleem <shiraz.saleem@intel.com> Tested-by:
Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Sagi Grimberg authored
No need to differentiate fabrics from pci/loop, also lower it to 32 as we don't really need 256 inflight admin commands. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- 20 May, 2017 1 commit
-
-
Vijay Immanuel authored
On rdma read errors, release the sq ref that was taken when the req was initialized. This avoids a hang in nvmet_sq_destroy() when the queue is being freed. Signed-off-by:
Vijay Immanuel <vijayi@attalasystems.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 04 Apr, 2017 3 commits
-
-
Sagi Grimberg authored
Instead of parsing address strings, use a generic helper. This also adds ipv6 (with address scopes) support. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Sagi Grimberg authored
If we are attacked with establishments/teradowns we need to make sure we do not consume too much system memory. Thus let ongoing controller teardowns complete before accepting new controller establishments. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Sagi Grimberg authored
When handling a new recv command, we grab a new rsp resource and check for the queue state being live. In case the queue is not in live state, we simply restore the rsp back to the free list. However in this flow we didn't set rsp->queue yet, so we cannot dereference it. Instead, make sure to initialize rsp->queue (and other rsp members) as soon as possible so we won't reference uninitialized variables. Reported-by:
Yi Zhang <yizhan@redhat.com> Reported-by:
Raju Rangoju <rajur@chelsio.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Raju Rangoju <rajur@chelsio.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 16 Mar, 2017 1 commit
-
-
Sagi Grimberg authored
When handling a new recv command, we grab a new rsp resource and check for the queue state being live. In case the queue is not in live state, we simply restore the rsp back to the free list. However in this flow we didn't set rsp->queue yet, so we cannot dereference it. Instead, make sure to initialize rsp->queue (and other rsp members) as soon as possible so we won't reference uninitialized variables. Reported-by:
Yi Zhang <yizhan@redhat.com> Reported-by:
Raju Rangoju <rajur@chelsio.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Raju Rangoju <rajur@chelsio.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
- 22 Feb, 2017 2 commits
-
-
Christophe JAILLET authored
According to the preceeding goto, it is likely that 'out_destroy_sq' was expected here. Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Max Gurtovoy authored
Also remove redundant debug prints. Signed-off-by:
Max Gurtovoy <maxg@mellanox.com> Reviewed-by:
Parav Pandit <parav@mellanox.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
- 26 Jan, 2017 1 commit
-
-
Parav Pandit authored
This patch performs dma sync operations on nvme_command and nvme_completion. nvme_command is synced (a) on receiving of the recv queue completion for cpu access. (b) before posting recv wqe back to rdma adapter for device access. nvme_completion is synced (a) on receiving of the recv queue completion of associated nvme_command for cpu access. (b) before posting send wqe to rdma adapter for device access. This patch is generated for git://git.infradead.org/nvme-fabrics.git Branch: nvmf-4.10 Signed-off-by:
Parav Pandit <parav@mellanox.com> Reviewed-by:
Max Gurtovoy <maxg@mellanox.com>
-
- 14 Dec, 2016 1 commit
-
-
Steve Wise authored
Acked-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Steve Wise <swise@opengridcomputing.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 06 Dec, 2016 2 commits
-
-
Max Gurtovoy authored
Signed-off-by:
Max Gurtovoy <maxg@mellanox.com> Signed-off-by:
Christoph Hellwig <hch@lst.de>
-
Bart Van Assche authored
nvmet_sq_init() returns a value <= 0. nvmet_rdma_cm_reject() expects a second argument that is a NVME_RDMA_CM_* constant. Hence this patch. Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by:
Sagi Grimberg <sagi@grimbeg.me> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
- 14 Nov, 2016 3 commits
-
-
Sagi Grimberg authored
draining the qp right after disconnect might not suffice because the nvmet sq is not fully drained (in nvmet_sq_destroy) and we might see completions after the drain. Instead, drain right before the qp destroy which comes after the sq destruction and we can be sure that no posts come after the drain. Tested-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
Sagi Grimberg authored
In case we accepted a queue connection and it failed, we might not remove the queue from the list until we unload and clean it up. We should delete it from the queue list on the relevant handler. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
Bart Van Assche authored
When we initiate queue teardown sequence we call rdma_destroy_qp which clears cm_id->qp, afterwards we call rdma_destroy_id, but we might see a rdma_cm event in between with a cleared cm_id->qp so watch out for that and silently ignore the event because this means that the queue teardown sequence is in progress. Signed-off-by:
Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
- 23 Sep, 2016 1 commit
-
-
Christoph Hellwig authored
Instead of exposing ib_get_dma_mr to ULPs and letting them use it more or less unchecked, this moves the capability of creating a global rkey into the RDMA core, where it can be easily audited. It also prints a warning everytime this feature is used as well. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 18 Aug, 2016 1 commit
-
-
Jay Freyensee authored
The host will be sending sqsize 0-based hsqsize value, the target need to be adjusted as well. Signed-off-by:
Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
- 16 Aug, 2016 1 commit
-
-
Vincent Stehlé authored
Avoid dereferencing the queue pointer in nvmet_rdma_release_queue_work() after it has been freed by nvmet_rdma_free_queue(). Fixes: d8f7750a ("nvmet-rdma: Correctly handle RDMA device hot removal") Signed-off-by:
Vincent Stehlé <vincent.stehle@intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me>
-
- 04 Aug, 2016 2 commits
-
-
Sagi Grimberg authored
Under extreme conditions this might cause data corruptions. By doing that we we repost the buffer and then post this buffer for the device to send. If we happen to use shared receive queues the device might write to the buffer before it sends it (there is no ordering between send and recv queues). Without SRQs we probably won't get that if the host doesn't mis-behave and send more than we allowed it, but relying on that is not really a good idea. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
Sagi Grimberg authored
When configuring a device attached listener, we may see device removal events. In this case we return a non-zero return code from the cm event handler which implicitly destroys the cm_id. It is possible that in the future the user will remove this listener and by that trigger a second call to rdma_destroy_id on an already destroyed cm_id -> BUG. In addition, when a queue bound (active session) cm_id generates a DEVICE_REMOVAL event we must guarantee all resources are cleaned up by the time we return from the event handler. Introduce nvmet_rdma_device_removal which addresses (or at least attempts to) both scenarios. Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Christoph Hellwig <hch@lst.de>
-
- 08 Jul, 2016 1 commit
-
-
Christoph Hellwig authored
This patch implements the RDMA transport for the NVMe over Fabrics target, which allows exporting NVMe over Fabrics functionality over RDMA fabrics (Infiniband, RoCE, iWARP). All NVMe logic is in the generic target and this module just provides a small glue between it and the generic code in the RDMA subsystem. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>, Signed-off-by:
Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by:
Ming Lin <ming.l@ssi.samsung.com> Signed-off-by:
Sagi Grimberg <sagi@grimberg.me> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Steve Wise <swise@opengridcomputing.com> Tested-by:
Steve Wise <swise@opengridcomputing.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-