1. 28 Mar, 2018 1 commit
    • Roland Dreier's avatar
      RDMA/ucma: Introduce safer rdma_addr_size() variants · 84652aef
      Roland Dreier authored
      There are several places in the ucma ABI where userspace can pass in a
      sockaddr but set the address family to AF_IB.  When that happens,
      rdma_addr_size() will return a size bigger than sizeof struct sockaddr_in6,
      and the ucma kernel code might end up copying past the end of a buffer
      not sized for a struct sockaddr_ib.
      Fix this by introducing new variants
          int rdma_addr_size_in6(struct sockaddr_in6 *addr);
          int rdma_addr_size_kss(struct __kernel_sockaddr_storage *addr);
      that are type-safe for the types used in the ucma ABI and return 0 if the
      size computed is bigger than the size of the type passed in.  We can use
      these new variants to check what size userspace has passed in before
      copying any addresses.
      Reported-by: <syzbot+6800425d54ed3ed8135d@syzkaller.appspotmail.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
  2. 27 Mar, 2018 1 commit
    • Jason Gunthorpe's avatar
      RDMA/rdma_cm: Fix use after free race with process_one_req · 9137108c
      Jason Gunthorpe authored
      process_one_req() can race with rdma_addr_cancel():
                 CPU0                                 CPU1
                 ====                                 ====
         // ODEBUG explodes since the work is still queued.
      Causing ODEBUG to detect the use after free:
      ODEBUG: free active (active state 0) object type: work_struct hint: process_one_req+0x0/0x6c0 include/net/dst.h:165
      WARNING: CPU: 0 PID: 79 at lib/debugobjects.c:291 debug_print_object+0x166/0x220 lib/debugobjects.c:288
      kvm: emulating exchange as write
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 79 Comm: kworker/u4:3 Not tainted 4.16.0-rc6+ #361
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: ib_addr process_one_req
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       panic+0x1e4/0x41c kernel/panic.c:183
       __warn+0x1dc/0x200 kernel/panic.c:547
       report_bug+0x1f4/0x2b0 lib/bug.c:186
       fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
       fixup_bug arch/x86/kernel/traps.c:247 [inline]
       do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
       invalid_op+0x1b/0x40 arch/x86/entry/entry_64.S:986
      RIP: 0010:debug_print_object+0x166/0x220 lib/debugobjects.c:288
      RSP: 0000:ffff8801d966f210 EFLAGS: 00010086
      RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff815acd6e
      RDX: 0000000000000000 RSI: 1ffff1003b2cddf2 RDI: 0000000000000000
      RBP: ffff8801d966f250 R08: 0000000000000000 R09: 1ffff1003b2cddc8
      R10: ffffed003b2cde71 R11: ffffffff86f39a98 R12: 0000000000000001
      R13: ffffffff86f15540 R14: ffffffff86408700 R15: ffffffff8147c0a0
       __debug_check_no_obj_freed lib/debugobjects.c:745 [inline]
       debug_check_no_obj_freed+0x662/0xf1f lib/debugobjects.c:774
       kfree+0xc7/0x260 mm/slab.c:3799
       process_one_req+0x2e7/0x6c0 drivers/infiniband/core/addr.c:592
       process_one_work+0xc47/0x1bb0 kernel/workqueue.c:2113
       worker_thread+0x223/0x1990 kernel/workqueue.c:2247
       kthread+0x33c/0x400 kernel/kthread.c:238
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:406
      Fixes: 5fff41e1
       ("IB/core: Fix race condition in resolving IP to MAC")
      Reported-by: <syzbot+3b4acab09b6463472d0a@syzkaller.appspotmail.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
  3. 28 Feb, 2018 1 commit
    • Muneendra Kumar M's avatar
      IB/core : Add null pointer check in addr_resolve · 4cd482c1
      Muneendra Kumar M authored
      dev_get_by_index is being called in addr_resolve
      function which returns NULL and NULL pointer access
      leads to kernel crash.
      Following call trace is observed while running
      rdma_lat test application
      [  146.173149] BUG: unable to handle kernel NULL pointer dereference
      at 00000000000004a0
      [  146.173198] IP: addr_resolve+0x9e/0x3e0 [ib_core]
      [  146.173221] PGD 0 P4D 0
      [  146.173869] Oops: 0000 [#1] SMP PTI
      [  146.182859] CPU: 8 PID: 127 Comm: kworker/8:1 Tainted: G  O 4.15.0-rc6+ #18
      [  146.183758] Hardware name: LENOVO System x3650 M5: -[8871AC1]-/01KN179,
       BIOS-[TCE132H-2.50]- 10/11/2017
      [  146.184691] Workqueue: ib_cm cm_work_handler [ib_cm]
      [  146.185632] RIP: 0010:addr_resolve+0x9e/0x3e0 [ib_core]
      [  146.186584] RSP: 0018:ffffc9000362faa0 EFLAGS: 00010246
      [  146.187521] RAX: 000000000000001b RBX: ffffc9000362fc08 RCX:
      [  146.188472] RDX: 0000000000000000 RSI: 0000000000000096 RDI
      : ffff88087fc16990
      [  146.189427] RBP: ffffc9000362fb18 R08: 00000000ffffff9d R09:
      [  146.190392] R10: 00000000000001e7 R11: 0000000000000001 R12:
      [  146.191361] R13: 0000000000000000 R14: 0000000000000001 R15:
      [  146.192327] FS:  0000000000000000(0000) GS:ffff88087fc00000(0000)
      [  146.193301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  146.194274] CR2: 00000000000004a0 CR3: 000000000220a002 CR4:
      [  146.195258] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      [  146.196256] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      [  146.197231] Call Trace:
      [  146.198209]  ? rdma_addr_register_client+0x30/0x30 [ib_core]
      [  146.199199]  rdma_resolve_ip+0x1af/0x280 [ib_core]
      [  146.200196]  rdma_addr_find_l2_eth_by_grh+0x154/0x2b0 [ib_core]
      The below patch adds the missing NULL pointer check
      returned by dev_get_by_index before accessing the netdev to
      avoid kernel crash.
      We observed the below crash when we try to do the below test.
       server                       client
       ---------                    ---------
       ---------                    ---------
      On server: rdma_lat -c -n 2 -s 1024
      On client:rdma_lat -c -n 2 -s 1024
      Fixes: 20029832
       ("IB/core: Validate route when we init ah")
      Signed-off-by: default avatarMuneendra <muneendra.kumar@broadcom.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
  4. 18 Dec, 2017 5 commits
  5. 13 Nov, 2017 1 commit
  6. 18 Oct, 2017 1 commit
  7. 10 Aug, 2017 4 commits
  8. 04 Aug, 2017 1 commit
    • Parav Pandit's avatar
      IB/core: Fix race condition in resolving IP to MAC · 5fff41e1
      Parav Pandit authored
      Currently while resolving IP address to MAC address single delayed work
      is used for resolving multiple such resolve requests. This singled work
      is essentially performs two tasks.
      (a) any retry needed to resolve and
      (b) it executes the callback function for all completed requests
      While work is executing callbacks, any new work scheduled on for this
      workqueue is lost because workqueue has completed looking at all pending
      requests and now looking at callbacks, but work is still under
      execution. Any further retry to look at pending requests in
      process_req() after executing callbacks would lead to similar race
      condition (may be reduce the probably further but doesn't eliminate it).
      Retrying to enqueue work that from queue_req() context is not something
      rest of the kernel modules have followed.
      Therefore fix in this patch utilizes kernel facility to enqueue multiple
      work items to a workqueue. This ensures that no such requests
      gets lost in synchronization. Request list is still maintained so that
      rdma_cancel_addr() can unlink the request and get the completion with
      error sooner. Neighbour update event handling continues to be handled in
      same way as before.
      Additionally process_req() work entry cancels any pending work for a
      request that gets completed while processing those requests.
      Originally ib_addr was ST workqueue, but it became MT work queue with
      patch of [1]. This patch again makes it similar to ST so that
      neighbour update events handler work item doesn't race with
      other work items.
      In one such below trace, (though on 4.5 based kernel) it can be seen
      that process_req() never executed the callback, which is likely for an
      event that was schedule by queue_req() when previous callback was
      getting executed by workqueue.
       [<ffffffff816b0dde>] schedule+0x3e/0x90
       [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210
       [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70
       [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr]
       [<ffffffff816b228f>] wait_for_completion+0x10f/0x170
       [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210
       [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr]
       [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr]
       [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0
       [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core]
       [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm]
       [<ffffffff81015982>] ? __switch_to+0x212/0x5e0
       [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm]
       [<ffffffff810a14c1>] process_one_work+0x151/0x4b0
       [<ffffffff810a1940>] worker_thread+0x120/0x480
       [<ffffffff816b074b>] ? __schedule+0x30b/0x890
       [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
       [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
       [<ffffffff810a6b1e>] kthread+0xce/0xf0
       [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff816b53a2>] ret_from_fork+0x42/0x70
       [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
      INFO: task kworker/u144:1:156520 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
      kworker/u144:1  D ffff883ffe1d7600     0 156520      2 0x00000080
      Workqueue: ib_addr process_req [ib_addr]
       ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200
       ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4
       ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde
      [1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.html
      Signed-off-by: default avatarParav Pandit <parav@mellanox.com>
      Reviewed-by: default avatarMark Bloch <markb@mellanox.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
  9. 17 Jul, 2017 2 commits
  10. 16 Jun, 2017 1 commit
    • Johannes Berg's avatar
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg authored
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      which actually doesn't cover pskb_put since there are only three
      users overall.
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
  11. 07 Jun, 2017 1 commit
    • Roland Dreier's avatar
      IB/addr: Fix setting source address in addr6_resolve() · 79e25959
      Roland Dreier authored
      Commit eea40b8f ("infiniband: call ipv6 route lookup via the stub
      interface") introduced a regression in address resolution when connecting
      to IPv6 destination addresses.  The old code called ip6_route_output(),
      while the new code calls ipv6_stub->ipv6_dst_lookup().  The two are almost
      the same, except that ipv6_dst_lookup() also calls ip6_route_get_saddr()
      if the source address is in6addr_any.
      This means that the test of ipv6_addr_any(&fl6.saddr) now never succeeds,
      and so we never copy the source address out.  This ends up causing
      rdma_resolve_addr() to fail, because without a resolved source address,
      cma_acquire_dev() will fail to find an RDMA device to use.  For me, this
      causes connecting to an NVMe over Fabrics target via RoCE / IPv6 to fail.
      Fix this by copying out fl6.saddr if ipv6_addr_any() is true for the original
      source address passed into addr6_resolve().  We can drop our call to
      ipv6_dev_get_saddr() because ipv6_dst_lookup() already does that work.
      Fixes: eea40b8f
       ("infiniband: call ipv6 route lookup via the stub interface")
      Cc: <stable@vger.kernel.org> # 3.12+
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
  12. 02 May, 2017 1 commit
  13. 28 Apr, 2017 1 commit
    • Paolo Abeni's avatar
      infiniband: call ipv6 route lookup via the stub interface · eea40b8f
      Paolo Abeni authored
      The infiniband address handle can be triggered to resolve an ipv6
      address in response to MAD packets, regardless of the ipv6
      module being disabled via the kernel command line argument.
      That will cause a call into the ipv6 routing code, which is not
      initialized, and a conseguent oops.
      This commit addresses the above issue replacing the direct lookup
      call with an indirect one via the ipv6 stub, which is properly
      initialized according to the ipv6 status (e.g. if ipv6 is
      disabled, the routing lookup fails gracefully)
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
  14. 13 Apr, 2017 1 commit
  15. 17 Nov, 2016 1 commit
  16. 07 Oct, 2016 1 commit
  17. 24 May, 2016 2 commits
  18. 19 Jan, 2016 3 commits
  19. 23 Dec, 2015 2 commits
  20. 28 Oct, 2015 1 commit
  21. 22 Oct, 2015 1 commit
    • Matan Barak's avatar
      IB/core: Use GID table in AH creation and dmac resolution · dbf727de
      Matan Barak authored
      Previously, vlan id and source MAC were used from QP attributes. Since
      the net device is now stored in the GID attributes, they could be used
      instead of getting this information from the QP attributes.
      IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed
      because there is no known libibverbs that uses them.
      This commit also modifies the vendors (mlx4, ocrdma) drivers in order
      to use the new approach.
      ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com>
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
  22. 02 Jun, 2015 1 commit
  23. 05 May, 2015 1 commit
  24. 16 Dec, 2014 1 commit
    • Or Kehati's avatar
      IB/addr: Improve address resolution callback scheduling · 346f98b4
      Or Kehati authored
      Address resolution always does a context switch to a work-queue to
      deliver the address resolution event.  When the IP address is already
      cached in the system ARP table, we're going through the following:
          rdma_resolve_ip --> addr_resolve (cache hit) -->
      which ends up with:
          queue_req --> set_timeout (now) --> mod_delayed_work(,, delay=1)
      We actually do realize that the timeout should be zero, but the code
      forces it to a minimum of one jiffie.
      Using one jiffie as the minimum delay value results in sub-optimal
      scheduling of executing this work item by the workqueue, which on the
      below testbed costs about 3-4ms out of 12ms total time.
      To fix that, we let the minimum delay to be zero.  Note that the
      connect step times change too, as there are address resolution calls
      from that flow.
      The results were taken from running both client and server on the
      same node, over mlx4 RoCE port.
      before -->
      step              total ms     max ms     min us  us / conn
      create id    :        0.01       0.01       6.00       6.00
      resolve addr :        4.02       4.01    4013.00    4016.00
      resolve route:        0.18       0.18     182.00     183.00
      create qp    :        1.15       1.15    1150.00    1150.00
      connect      :        6.73       6.73    6730.00    6731.00
      disconnect   :        0.55       0.55     549.00     550.00
      destroy      :        0.01       0.01       9.00       9.00
      after -->
      step              total ms     max ms     min us  us / conn
      create id    :        0.01       0.01       6.00       6.00
      resolve addr :        0.05       0.05      49.00      52.00
      resolve route:        0.21       0.21     207.00     208.00
      create qp    :        1.10       1.10    1104.00    1104.00
      connect      :        1.22       1.22    1220.00    1221.00
      disconnect   :        0.71       0.71     713.00     713.00
      destroy      :        0.01       0.01       9.00       9.00
      Signed-off-by: default avatarOr Kehati <ork@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Acked-by: default avatarSean Hefty <sean.hefty@intel.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
  25. 14 Jan, 2014 1 commit
    • Matan Barak's avatar
      IB/core: Ethernet L2 attributes in verbs/cm structures · dd5f03be
      Matan Barak authored
      This patch add the support for Ethernet L2 attributes in the
      verbs/cm/cma structures.
      When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
      in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
      Thus, those attributes were added to the following structures:
      * ib_ah_attr - added dmac
      * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
      * ib_wc - added smac, vlan_id
      * ib_sa_path_rec - added smac, dmac, vlan_id
      * cm_av - added smac and vlan_id
      For the path record structure, extra care was taken to avoid the new
      fields when packing it into wire format, so we don't break the IB CM
      and SA wire protocol.
      On the active side, the CM fills. its internal structures from the
      path provided by the ULP.  We add there taking the ETH L2 attributes
      and placing them into the CM Address Handle (struct cm_av).
      On the passive side, the CM fills its internal structures from the WC
      associated with the REQ message.  We add there taking the ETH L2
      attributes from the WC.
      When the HW driver provides the required ETH L2 attributes in the WC,
      they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
      code checks for the presence of these flags, and in their absence does
      address resolution from the ib_init_ah_from_wc() helper function.
      ib_modify_qp_is_ok is also updated to consider the link layer. Some
      parameters are mandatory for Ethernet link layer, while they are
      irrelevant for IB.  Vendor drivers are modified to support the new
      function signature.
      Signed-off-by: default avatarMatan Barak <matanb@mellanox.com>
      Signed-off-by: default avatarOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: default avatarRoland Dreier <roland@purestorage.com>
  26. 20 Jun, 2013 1 commit
  27. 13 Aug, 2012 1 commit
    • Tejun Heo's avatar
      workqueue: use mod_delayed_work() instead of cancel + queue · 41f63c53
      Tejun Heo authored
      Convert delayed_work users doing cancel_delayed_work() followed by
      queue_delayed_work() to mod_delayed_work().
      Most conversions are straight-forward.  Ones worth mentioning are,
      * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
        use mod_delayed_work() and cancel loop in
        edac_mc_reset_delay_period() is dropped.
      * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
        watchdog is active or not.  @fan_watchdog_active and related code
      * drivers/power/charger-manager.c: Seemingly a lot of
        delayed_work_pending() abuse going on here.
        [delayed_]work_pending() are unsynchronized and racy when used like
        this.  I converted one instance in fullbatt_handler().  Please
        conver the rest so that it invokes workqueue APIs for the intended
        target state rather than trying to game work item pending state
        transitions.  e.g. if timer should be modified - call
        mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
      * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
        simplified.  Note that round_jiffies() calls in this function are
        meaningless.  round_jiffies() work on absolute jiffies not delta
        delay used by delayed_work.
      v2: Tomi pointed out that __cancel_delayed_work() users can't be
          safely converted to mod_delayed_work().  They could be calling it
          from irq context and if that happens while delayed_work_timer_fn()
          is running, it could deadlock.  __cancel_delayed_work() users are
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Acked-by: default avatarDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: default avatarAnton Vorontsov <cbouatmailru@gmail.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
  28. 09 Jul, 2012 1 commit