- 21 Mar, 2018 2 commits
-
-
Kalderon, Michal authored
QPs that were configured with ack timeout value lower than 1 msec will not implement re-transmission timeout. This means that if a packet / ACK were dropped, the QP will not retransmit this packet. This can lead to an application hang. Fixes: cecbcddf ("qedr: Add support for QP verbs") Signed-off-by:
Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by:
Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mark Bloch authored
In case we failed to create UMR resources, mark them as invalid so we won't try to destroy them on the unwind path. Add the relevant checks to destroy_umrc_res(), this is done for the unlikely event ib_register_device() or create_umr_res() err out and we try to destroy invalid objects. Fixes: 42cea83f ("IB/mlx5: Fix cleanup order on unload") Signed-off-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- 14 Mar, 2018 4 commits
-
-
Arnd Bergmann authored
On 32-bit targets, we otherwise get a warning about an impossible constant integer expression: In file included from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/ib_verbs.c:39: drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function 'bnxt_re_query_device': include/linux/bitops.h:7:24: error: left shift count >= width of type [-Werror=shift-count-overflow] #define BIT(nr) (1UL << (nr)) ^~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:61:34: note: in expansion of macro 'BIT' #define BNXT_RE_MAX_MR_SIZE_HIGH BIT(39) ^~~ drivers/infiniband/hw/bnxt_re/bnxt_re.h:62:30: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE_HIGH' #define BNXT_RE_MAX_MR_SIZE BNXT_RE_MAX_MR_SIZE_HIGH ^~~~~~~~~~~~~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/ib_verbs.c:149:25: note: in expansion of macro 'BNXT_RE_MAX_MR_SIZE' ib_attr->max_mr_size = BNXT_RE_MAX_MR_SIZE; ^~~~~~~~~~~~~~~~~~~ Fixes: 872f3578 ("RDMA/bnxt_re: Add support for MRs with Huge pages") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Arnd Bergmann authored
Building for a 32-bit target results in a couple of warnings from casting between a 32-bit pointer and a 64-bit integer: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_service_nq': drivers/infiniband/hw/bnxt_re/qplib_fp.c:333:23: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] bnxt_qplib_arm_srq((struct bnxt_qplib_srq *)q_handle, ^ drivers/infiniband/hw/bnxt_re/qplib_fp.c:336:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] (struct bnxt_qplib_srq *)q_handle, ^ In file included from include/linux/byteorder/little_endian.h:5, from arch/arm/include/uapi/asm/byteorder.h:22, from include/asm-generic/bitops/le.h:6, from arch/arm/include/asm/bitops.h:342, from include/linux/bitops.h:38, from include/linux/kernel.h:11, from include/linux/interrupt.h:6, from drivers/infiniband/hw/bnxt_re/qplib_fp.c:39: drivers/infiniband/hw/bnxt_re/qplib_fp.c: In function 'bnxt_qplib_create_srq': include/uapi/linux/byteorder/little_endian.h:31:43: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast] #define __cpu_to_le64(x) ((__force __le64)(__u64)(x)) ^ include/linux/byteorder/generic.h:86:21: note: in expansion of macro '__cpu_to_le64' #define cpu_to_le64 __cpu_to_le64 ^~~~~~~~~~~~~ drivers/infiniband/hw/bnxt_re/qplib_fp.c:569:19: note: in expansion of macro 'cpu_to_le64' req.srq_handle = cpu_to_le64(srq); Using a uintptr_t as an intermediate works on all architectures. Fixes: 37cb11ac ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mark Bloch authored
On load we create private CQ/QP/PD in order to be used by UMR, we create those resources after we register ourself as an IB device, and we destroy them after we unregister as an IB device. This was changed by commit 16c1975f ("IB/mlx5: Create profile infrastructure to add and remove stages") which moved the destruction before we unregistration. This allowed to trigger an invalid memory access when unloading mlx5_ib while there are open resources: BUG: unable to handle kernel paging request at 00000001002c012c ... Call Trace: mlx5_ib_post_send_wait+0x75/0x110 [mlx5_ib] __slab_free+0x9a/0x2d0 delay_time_func+0x10/0x10 [mlx5_ib] unreg_umr.isra.15+0x4b/0x50 [mlx5_ib] mlx5_mr_cache_free+0x46/0x150 [mlx5_ib] clean_mr+0xc9/0x190 [mlx5_ib] dereg_mr+0xba/0xf0 [mlx5_ib] ib_dereg_mr+0x13/0x20 [ib_core] remove_commit_idr_uobject+0x16/0x70 [ib_uverbs] uverbs_cleanup_ucontext+0xe8/0x1a0 [ib_uverbs] ib_uverbs_cleanup_ucontext.isra.9+0x19/0x40 [ib_uverbs] ib_uverbs_remove_one+0x162/0x2e0 [ib_uverbs] ib_unregister_device+0xd4/0x190 [ib_core] __mlx5_ib_remove+0x2e/0x40 [mlx5_ib] mlx5_remove_device+0xf5/0x120 [mlx5_core] mlx5_unregister_interface+0x37/0x90 [mlx5_core] mlx5_ib_cleanup+0xc/0x225 [mlx5_ib] SyS_delete_module+0x153/0x230 do_syscall_64+0x62/0x110 entry_SYSCALL_64_after_hwframe+0x21/0x86 ... We restore the original behavior by breaking the UMR stage into two parts, pre and post IB registration stages, this way we can restore the original functionality and maintain clean separation of logic between stages. Fixes: 16c1975f ("IB/mlx5: Create profile infrastructure to add and remove stages") Signed-off-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Leon Romanovsky authored
The failure in rereg_mr flow caused to set garbage value (error value) into mr->umem pointer. This pointer is accessed at the release stage and it causes to the following crash. There is not enough to simply change umem to point to NULL, because the MR struct is needed to be accessed during MR deregistration phase, so delay kfree too. [ 6.237617] BUG: unable to handle kernel NULL pointer dereference a 0000000000000228 [ 6.238756] IP: ib_dereg_mr+0xd/0x30 [ 6.239264] PGD 80000000167eb067 P4D 80000000167eb067 PUD 167f9067 PMD 0 [ 6.240320] Oops: 0000 [#1] SMP PTI [ 6.240782] CPU: 0 PID: 367 Comm: dereg Not tainted 4.16.0-rc1-00029-gc198fafe0453 #183 [ 6.242120] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 6.244504] RIP: 0010:ib_dereg_mr+0xd/0x30 [ 6.245253] RSP: 0018:ffffaf5d001d7d68 EFLAGS: 00010246 [ 6.246100] RAX: 0000000000000000 RBX: ffff95d4172daf00 RCX: 0000000000000000 [ 6.247414] RDX: 00000000ffffffff RSI: 0000000000000001 RDI: ffff95d41a317600 [ 6.248591] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 [ 6.249810] R10: ffff95d417033c10 R11: 0000000000000000 R12: ffff95d4172c3a80 [ 6.251121] R13: ffff95d4172c3720 R14: ffff95d4172c3a98 R15: 00000000ffffffff [ 6.252437] FS: 0000000000000000(0000) GS:ffff95d41fc00000(0000) knlGS:0000000000000000 [ 6.253887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6.254814] CR2: 0000000000000228 CR3: 00000000172b4000 CR4: 00000000000006b0 [ 6.255943] Call Trace: [ 6.256368] remove_commit_idr_uobject+0x1b/0x80 [ 6.257118] uverbs_cleanup_ucontext+0xe4/0x190 [ 6.257855] ib_uverbs_cleanup_ucontext.constprop.14+0x19/0x40 [ 6.258857] ib_uverbs_close+0x2a/0x100 [ 6.259494] __fput+0xca/0x1c0 [ 6.259938] task_work_run+0x84/0xa0 [ 6.260519] do_exit+0x312/0xb40 [ 6.261023] ? __do_page_fault+0x24d/0x490 [ 6.261707] do_group_exit+0x3a/0xa0 [ 6.262267] SyS_exit_group+0x10/0x10 [ 6.262802] do_syscall_64+0x75/0x180 [ 6.263391] entry_SYSCALL_64_after_hwframe+0x21/0x86 [ 6.264253] RIP: 0033:0x7f1b39c49488 [ 6.264827] RSP: 002b:00007ffe2de05b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 6.266049] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1b39c49488 [ 6.267187] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000 [ 6.268377] RBP: 00007f1b39f258e0 R08: 00000000000000e7 R09: ffffffffffffff98 [ 6.269640] R10: 00007f1b3a147260 R11: 0000000000000246 R12: 00007f1b39f258e0 [ 6.270783] R13: 00007f1b39f2ac20 R14: 0000000000000000 R15: 0000000000000000 [ 6.271943] Code: 74 07 31 d2 e9 25 d8 6c 00 b8 da ff ff ff c3 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 07 53 48 8b 5f 08 <48> 8b 80 28 02 00 00 e8 f7 d7 6c 00 85 c0 75 04 3e ff 4b 18 5b [ 6.274927] RIP: ib_dereg_mr+0xd/0x30 RSP: ffffaf5d001d7d68 [ 6.275760] CR2: 0000000000000228 [ 6.276200] ---[ end trace a35641f1c474bd20 ]--- Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Reported-by:
Noa Osherovich <noaos@mellanox.com> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 13 Mar, 2018 2 commits
-
-
Boris Pismenny authored
This patch validates user provided input to prevent integer overflow due to integer manipulation in the mlx5_ib_create_srq function. Cc: syzkaller <syzkaller@googlegroups.com> Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by:
Boris Pismenny <borisp@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Boris Pismenny authored
Add a check for the length of the qpin structure to prevent out-of-bounds reads BUG: KASAN: slab-out-of-bounds in create_raw_packet_qp+0x114c/0x15e2 Read of size 8192 at addr ffff880066b99290 by task syz-executor3/549 CPU: 3 PID: 549 Comm: syz-executor3 Not tainted 4.15.0-rc2+ #27 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0x8d/0xd4 print_address_description+0x73/0x290 kasan_report+0x25c/0x370 ? create_raw_packet_qp+0x114c/0x15e2 memcpy+0x1f/0x50 create_raw_packet_qp+0x114c/0x15e2 ? create_raw_packet_qp_tis.isra.28+0x13d/0x13d ? lock_acquire+0x370/0x370 create_qp_common+0x2245/0x3b50 ? destroy_qp_user.isra.47+0x100/0x100 ? kasan_kmalloc+0x13d/0x170 ? sched_clock_cpu+0x18/0x180 ? fs_reclaim_acquire.part.15+0x5/0x30 ? __lock_acquire+0xa11/0x1da0 ? sched_clock_cpu+0x18/0x180 ? kmem_cache_alloc_trace+0x17e/0x310 ? mlx5_ib_create_qp+0x30e/0x17b0 mlx5_ib_create_qp+0x33d/0x17b0 ? sched_clock_cpu+0x18/0x180 ? create_qp_common+0x3b50/0x3b50 ? lock_acquire+0x370/0x370 ? __radix_tree_lookup+0x180/0x220 ? uverbs_try_lock_object+0x68/0xc0 ? rdma_lookup_get_uobject+0x114/0x240 create_qp.isra.5+0xce4/0x1e20 ? ib_uverbs_ex_create_cq_cb+0xa0/0xa0 ? copy_ah_attr_from_uverbs.isra.2+0xa00/0xa00 ? ib_uverbs_cq_event_handler+0x160/0x160 ? __might_fault+0x17c/0x1c0 ib_uverbs_create_qp+0x21b/0x2a0 ? ib_uverbs_destroy_cq+0x2e0/0x2e0 ib_uverbs_write+0x55a/0xad0 ? ib_uverbs_destroy_cq+0x2e0/0x2e0 ? ib_uverbs_destroy_cq+0x2e0/0x2e0 ? ib_uverbs_open+0x760/0x760 ? futex_wake+0x147/0x410 ? check_prev_add+0x1680/0x1680 ? do_futex+0x3d3/0xa60 ? sched_clock_cpu+0x18/0x180 __vfs_write+0xf7/0x5c0 ? ib_uverbs_open+0x760/0x760 ? kernel_read+0x110/0x110 ? lock_acquire+0x370/0x370 ? __fget+0x264/0x3b0 vfs_write+0x18a/0x460 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x18/0x85 RIP: 0033:0x4477b9 RSP: 002b:00007f1822cadc18 EFLAGS: 00000292 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00000000004477b9 RDX: 0000000000000070 RSI: 000000002000a000 RDI: 0000000000000005 RBP: 0000000000708000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000292 R12: 00000000ffffffff R13: 0000000000005d70 R14: 00000000006e6e30 R15: 0000000020010ff0 Allocated by task 549: __kmalloc+0x15e/0x340 kvmalloc_node+0xa1/0xd0 create_user_qp.isra.46+0xd42/0x1610 create_qp_common+0x2e63/0x3b50 mlx5_ib_create_qp+0x33d/0x17b0 create_qp.isra.5+0xce4/0x1e20 ib_uverbs_create_qp+0x21b/0x2a0 ib_uverbs_write+0x55a/0xad0 __vfs_write+0xf7/0x5c0 vfs_write+0x18a/0x460 SyS_write+0xc7/0x1a0 entry_SYSCALL_64_fastpath+0x18/0x85 Freed by task 368: kfree+0xeb/0x2f0 kernfs_fop_release+0x140/0x180 __fput+0x266/0x700 task_work_run+0x104/0x180 exit_to_usermode_loop+0xf7/0x110 syscall_return_slowpath+0x298/0x370 entry_SYSCALL_64_fastpath+0x83/0x85 The buggy address belongs to the object at ffff880066b99180 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 272 bytes inside of 512-byte region [ffff880066b99180, ffff880066b99380) The buggy address belongs to the page: page:000000006040eedd count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x4000000000008100(slab|head) raw: 4000000000008100 0000000000000000 0000000000000000 0000000180190019 raw: ffffea00019a7500 0000000b0000000b ffff88006c403080 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880066b99180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff880066b99200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880066b99280: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff880066b99300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff880066b99380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Cc: syzkaller <syzkaller@googlegroups.com> Fixes: 0fb2ed66 ("IB/mlx5: Add create and destroy functionality for Raw Packet QP") Signed-off-by:
Boris Pismenny <borisp@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 09 Mar, 2018 2 commits
-
-
Leon Romanovsky authored
The user can provide very large cqe_size which will cause to integer overflow as it can be seen in the following UBSAN warning: ======================================================================= UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53 signed integer overflow: 64870 * 65536 cannot be represented in type 'int' CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 Call Trace: dump_stack+0xde/0x164 ? dma_virt_map_sg+0x22c/0x22c ubsan_epilogue+0xe/0x81 handle_overflow+0x1f3/0x251 ? __ubsan_handle_negate_overflow+0x19b/0x19b ? lock_acquire+0x440/0x440 mlx5_ib_resize_cq+0x17e7/0x1e40 ? cyc2ns_read_end+0x10/0x10 ? native_read_msr_safe+0x6c/0x9b ? cyc2ns_read_end+0x10/0x10 ? mlx5_ib_modify_cq+0x220/0x220 ? sched_clock_cpu+0x18/0x200 ? lookup_get_idr_uobject+0x200/0x200 ? rdma_lookup_get_uobject+0x145/0x2f0 ib_uverbs_resize_cq+0x207/0x3e0 ? ib_uverbs_ex_create_cq+0x250/0x250 ib_uverbs_write+0x7f9/0xef0 ? cyc2ns_read_end+0x10/0x10 ? print_irqtrace_events+0x280/0x280 ? ib_uverbs_ex_create_cq+0x250/0x250 ? uverbs_devnode+0x110/0x110 ? sched_clock_cpu+0x18/0x200 ? do_raw_spin_trylock+0x100/0x100 ? __lru_cache_add+0x16e/0x290 __vfs_write+0x10d/0x700 ? uverbs_devnode+0x110/0x110 ? kernel_read+0x170/0x170 ? sched_clock_cpu+0x18/0x200 ? security_file_permission+0x93/0x260 vfs_write+0x1b0/0x550 SyS_write+0xc7/0x1a0 ? SyS_read+0x1a0/0x1a0 ? trace_hardirqs_on_thunk+0x1a/0x1c entry_SYSCALL_64_fastpath+0x1e/0x8b RIP: 0033:0x433549 RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217 ======================================================================= Cc: syzkaller <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> # 3.13 Fixes: bde51583 ("IB/mlx5: Add support for resize CQ") Reported-by:
Noa Osherovich <noaos@mellanox.com> Reviewed-by:
Yishai Hadas <yishaih@mellanox.com> Signed-off-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Doug Ledford authored
The original commit of this patch has a munged log message that is missing several of the tags the original author intended to be on the patch. This was due to patchworks misinterpreting a cut-n-paste separator line as an end of message line and munging the mbox that was used to import the patch: https://patchwork.kernel.org/patch/10264089/ The original patch will be reapplied with a fixed commit message so the proper tags are applied. This reverts commit aa0de36a . Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 07 Mar, 2018 10 commits
-
-
Leon Romanovsky authored
The user can provide very large cqe_size which will cause to integer overflow as it can be seen in the following UBSAN warning: Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Selvin Xavier authored
Hitting the following hardlockup due to a race condition in error CQE processing. [26146.879798] bnxt_en 0000:04:00.0: QPLIB: FP: CQ Processed Req [26146.886346] bnxt_en 0000:04:00.0: QPLIB: wr_id[1251] = 0x0 with status 0xa [26156.350935] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4 [26156.357470] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace [26156.447957] CPU: 4 PID: 3413 Comm: kworker/4:1H Kdump: loaded [26156.457994] Hardware name: Dell Inc. PowerEdge R430/0CN7X8, [26156.466390] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [26156.472639] Call Trace: [26156.475379] <NMI> [<ffffffff98d0d722>] dump_stack+0x19/0x1b [26156.481833] [<ffffffff9873f775>] watchdog_overflow_callback+0x135/0x140 [26156.489341] [<ffffffff9877f237>] __perf_event_overflow+0x57/0x100 [26156.496256] [<ffffffff98787c24>] perf_event_overflow+0x14/0x20 [26156.502887] [<ffffffff9860a580>] intel_pmu_handle_irq+0x220/0x510 [26156.509813] [<ffffffff98d16031>] perf_event_nmi_handler+0x31/0x50 [26156.516738] [<ffffffff98d1790c>] nmi_handle.isra.0+0x8c/0x150 [26156.523273] [<ffffffff98d17be8>] do_nmi+0x218/0x460 [26156.528834] [<ffffffff98d16d79>] end_repeat_nmi+0x1e/0x7e [26156.534980] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.543268] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.551556] [<ffffffff987089c0>] ? native_queued_spin_lock_slowpath+0x1d0/0x200 [26156.559842] <EOE> [<ffffffff98d083e4>] queued_spin_lock_slowpath+0xb/0xf [26156.567555] [<ffffffff98d15690>] _raw_spin_lock+0x20/0x30 [26156.573696] [<ffffffffc08381a1>] bnxt_qplib_lock_buddy_cq+0x31/0x40 [bnxt_re] [26156.581789] [<ffffffffc083bbaa>] bnxt_qplib_poll_cq+0x43a/0xf10 [bnxt_re] [26156.589493] [<ffffffffc083239b>] bnxt_re_poll_cq+0x9b/0x760 [bnxt_re] The issue happens if RQ poll_cq or SQ poll_cq or Async error event tries to put the error QP in flush list. Since SQ and RQ of each error qp are added to two different flush list, we need to protect it using locks of corresponding CQs. Difference in order of acquiring the lock in SQ poll_cq and RQ poll_cq can cause a hard lockup. Revisits the locking strategy and removes the usage of qplib_cq.hwq.lock. Instead of this lock, introduces qplib_cq.flush_lock to handle addition/deletion of QPs in flush list. Also, always invoke the flush_lock in order (SQ CQ lock first and then RQ CQ lock) to avoid any potential deadlock. Other than the poll_cq context, the movement of QP to/from flush list can be done in modify_qp context or from an async error event from HW. Synchronize these operations using the bnxt_re verbs layer CQ locks. To achieve this, adds a call back to the HW abstraction layer(qplib) to bnxt_re ib_verbs layer in case of async error event. Also, removes the buddy cq functions as it is no longer required. Signed-off-by:
Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by:
Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by:
Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Dan Carpenter authored
"err" is either zero or possibly uninitialized here. It should be -EINVAL. Fixes: 427c1e7b ("{IB, net}/mlx5: Move the modify QP operation table to mlx5_ib") Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Acked-by:
Leon Romanovsky <leonro@mellanox.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mark Bloch authored
The series that introduced dual port RoCE mode assumed that we don't have a dual port HCA that use the mlx5 driver, this is not the case for Connect-IB HCAs. This reasoning led to assigning 1 as the native port index which causes issue when the second port is used. For example query_pkey() when called on the second port will return values of the first port. Make sure that we assign the right port index as the native port index. Fixes: 32f69e4b ("{net, IB}/mlx5: Manage port association for multiport RoCE") Reviewed-by:
Daniel Jurgens <danielj@mellanox.com> Signed-off-by:
Mark Bloch <markb@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Jack M authored
The commit cited below added a gid_type field (RoCEv1 or RoCEv2) to GID properties. When adding GIDs, this gid_type field was copied over to the hardware gid table. However, when deleting GIDs, the gid_type field was not copied over to the hardware gid table. As a result, when running RoCEv2, all RoCEv2 gids in the hardware gid table were set to type RoCEv1 when any gid was deleted. This problem would persist until the next gid was added (which would again restore the gid_type field for all the gids in the hardware gid table). Fix this by copying over the gid_type field to the hardware gid table when deleting gids, so that the gid_type of all remaining gids is preserved when a gid is deleted. Fixes: b699a859 ("IB/mlx4: Add gid_type to GID properties") Reviewed-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Jack Morgenstein authored
When using IPv4 addresses in RoCEv2, the GID format for the mapped IPv4 address should be: ::ffff:<4-byte IPv4 address>. In the cited commit, IPv4 mapped IPV6 addresses had the 3 upper dwords zeroed out by memset, which resulted in deleting the ffff field. However, since procedure ipv6_addr_v4mapped() already verifies that the gid has format ::ffff:<ipv4 address>, no change is needed for the gid, and the memset can simply be removed. Fixes: 7e57b85c ("IB/mlx4: Add support for setting RoCEv2 gids in hardware") Reviewed-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kalderon, Michal authored
iWARP does not support RDMA WRITE or SEND with immediate data. Driver should check this before submitting to FW and return an immediate error Signed-off-by:
Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by:
Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kalderon, Michal authored
Race in qedr_poll_cq, lastest_cqe wasn't protected by lock, leading to a case where two context's accessing poll_cq at the same time lead to one of them having a pointer to an old latest_cqe and reading an invalid cqe element Signed-off-by:
Amit Radzi <Amit.Radzi@cavium.com> Signed-off-by:
Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by:
Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kalderon, Michal authored
Fix iWARP connect and listen to use the mapped port for ipv4 and ipv6. Without this fixed, running on a server that has iwpmd enabled will not use the correct port Signed-off-by:
Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by:
Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kalderon, Michal authored
The wrong parameter was passed to dst_neigh_lookup Signed-off-by:
Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by:
Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- 28 Feb, 2018 7 commits
-
-
Selvin Xavier authored
Release the netdev references in the cleanup path. Invokes the cleanup routines if bnxt_re_ib_reg fails. Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Devesh Sharma authored
To support host systems with non 4K page size, l2_db_size shall be calculated with 4096 instead of PAGE_SIZE. Also, supply the host page size to FW during initialization. Signed-off-by:
Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Devesh Sharma authored
HW requires an unconditonal fence for all non-wire memory operations through SQ. This guarantees the completions of these memory operations. Signed-off-by:
Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Moni Shoua authored
IB spec says that a lid should be ignored when link layer is Ethernet, for example when building or parsing a CM request message (CA17-34). However, since ib_lid_be16() and ib_lid_cpu16() validates the slid, not only when link layer is IB, we set the slid to zero to prevent false warnings in the kernel log. Fixes: 62ede777 ("Add OPA extended LID support") Reviewed-by:
Majd Dibbiny <majd@mellanox.com> Signed-off-by:
Moni Shoua <monis@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Daniel Jurgens authored
All other mlx5_events report the port number as 1 based, which is how FW reports it in the port event EQE. Reporting 0 for this event causes mlx5_ib to not raise a fatal event notification to registered clients due to a seemingly invalid port. All switch cases in mlx5_ib_event that go through the port check are supposed to set the port now, so just do it once at variable declaration. Fixes: 89d44f0a ("net/mlx5_core: Add pci error handlers to mlx5_core driver") Reviewed-by:
Majd Dibbiny <majd@mellanox.com> Signed-off-by:
Daniel Jurgens <danielj@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Noa Osherovich authored
During QP creation, the mlx5 driver translates the QP type to an internal value which is passed on to FW. There was no check to make sure that the translated value is valid, and -EINVAL was coerced into the mailbox command. Current firmware refuses this as an invalid QP type, but future/past firmware may do something else. Fixes: 09a7d9ec ('{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc') Reviewed-by:
Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by:
Noa Osherovich <noaos@mellanox.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Sergey Gorenko authored
The value of mr->ndescs greater than mr->max_descs is set in the function mlx5_ib_sg_to_klms() if sg_nents is greater than mr->max_descs. This is an invalid value and it causes the following error when registering mr: mlx5_0:dump_cqe:276:(pid 193): dump error cqe 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000030: 00 00 00 00 0f 00 78 06 25 00 00 8b 08 1e 8f d3 Cc: <stable@vger.kernel.org> # 4.5 Fixes: b005d316 ("mlx5: Add arbitrary sg list support") Signed-off-by:
Sergey Gorenko <sergeygo@mellanox.com> Tested-by:
Laurence Oberman <loberman@redhat.com> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- 20 Feb, 2018 5 commits
-
-
Selvin Xavier authored
BNXT_RE_FLAG_TASK_IN_PROG doesn't handle multiple work requests posted together. Track schedule of multiple workqueue items by maintaining a per device counter and proceed with IB dereg only if this counter is zero. flush_workqueue is no longer required from NETDEV_UNREGISTER path. Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Selvin Xavier authored
During driver unload, the driver proceeds with cleanup without waiting for the scheduled events. So the device pointers get freed up and driver crashes when the events are scheduled later. Flush the bnxt_re_task work queue before starting device removal. Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Selvin Xavier authored
Avoid system crash when destroy_qp is invoked while the driver is processing the poll_cq. Synchronize these functions using the cq_lock. Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Devesh Sharma authored
Driver leaves the QP memory pinned if QP create command fails from the FW. Avoids this scenario by adding a proper exit path if the FW command fails. Signed-off-by:
Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
Devesh Sharma authored
More testing needs to be done before enabling this feature. Disabling the feature temporarily Signed-off-by:
Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 15 Feb, 2018 1 commit
-
-
Adit Ranadive authored
This ensures that we return the right structures back to userspace. Otherwise, it looks like the reserved fields in the response structures in userspace might have uninitialized data in them. Fixes: 8b10ba78 ("RDMA/vmw_pvrdma: Add shared receive queue support") Fixes: 29c8d9eb ("IB: Add vmw_pvrdma driver") Suggested-by:
Jason Gunthorpe <jgg@mellanox.com> Reviewed-by:
Bryan Tan <bryantan@vmware.com> Reviewed-by:
Aditya Sarwade <asarwade@vmware.com> Reviewed-by:
Jorgen Hansen <jhansen@vmware.com> Signed-off-by:
Adit Ranadive <aditr@vmware.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
- 11 Feb, 2018 1 commit
-
-
Linus Torvalds authored
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- 05 Feb, 2018 1 commit
-
-
oulijun authored
The hip06 and hip08 run on a little endian ARM, it needs to revise the annotations to indicate that the HW uses little endian data in the various DMA buffers, and flow the necessary swaps throughout. The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu swaps are no-op for this, which makes the only substantive change the handling of imm_data which is now mandatory swapped. This also keep match with the userspace hns driver and resolve the warning by sparse. Signed-off-by:
Lijun Ou <oulijun@huawei.com> Signed-off-by:
Doug Ledford <dledford@redhat.com>
-
- 01 Feb, 2018 5 commits
-
-
Don Hiatt authored
Add trace_hfi1_rcvhdr support for bypass packets. While here, remove the etype argument as it is available in struct hfi1_packet. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Don Hiatt <don.hiatt@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kamenee Arumugam authored
Kzalloc_node API doesn't check for overflows in size multiplication. While kcalloc API check for overflows in size multiplication but these implementations are not NUMA-aware. This conversion allowed for correcting an allocation used in the hot path to be on the local NUMA and ensure us overflow free multiplication for the size of a memory allocation. Reviewed-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mitko Haralanov authored
The routine which shows the fault stats checks the counters to determine whether to show any stats based on the number of transmitted pkts/bytes for a particular opcode. Unfortunately, it only checked the receive counters. As a result, if any packet faults have happened for packets egressing the HFI, those stats would not be shown. In order to fix this, the routine is amended to also check the TX counters. With this change the pkt/byte counts are the sum of both TX and RX counts for the opcode. Fixes: 1b311f89 ("IB/hfi1: Add tx_opcode_stats like the opcode_stats") Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Reviewed-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Mike Marciniszyn authored
These values were introduced as part of the 16B code to account for the varying size of the LRH between the differing packet formats. Replace the blind constants with defines based on FIELD_SIZEOF() calls. Fixes: 5b6cabb0 ("IB/hfi1: Add 16B RC/UC support") Reviewed-by:
Don Hiatt <don.hiatt@intel.com> Signed-off-by:
Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-
Kamenee Arumugam authored
HFI's counters SendWaitCnt and SendWaitVlCnt are in units of TXE cycle time (at 805MHz). OPA counters PortXmitWait and PortVLXmtWait are in units of flit times. Convert the counter values to flit units using following conversion formula: PortXmitWait = SendWaitCnt * 2 * (4 /link_width) * (25 Gbps /link_speed) PortVLXmitWait = SendWaitVLCnt * 2 * (4 /link_width) * (25 Gbps /link_speed) At link up or downgrade events, the link width can change. To ensure accurate counter calculations, sample the counters after the events, during counter requests, and then aggregate the OPA counters. Reviewed-by:
Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by:
Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by:
Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by:
Jason Gunthorpe <jgg@mellanox.com>
-