Skip to content
  • Mark Bloch's avatar
    IB/cm: Mark stale CM id's whenever the mad agent was unregistered · 9db0ff53
    Mark Bloch authored
    When there is a CM id object that has port assigned to it, it means that
    the cm-id asked for the specific port that it should go by it, but if
    that port was removed (hot-unplug event) the cm-id was not updated.
    In order to fix that the port keeps a list of all the cm-id's that are
    planning to go by it, whenever the port is removed it marks all of them
    as invalid.
    
    This commit fixes a kernel panic which happens when running traffic between
    guests and we force reboot a guest mid traffic, it triggers a kernel panic:
    
     Call Trace:
      [<ffffffff815271fa>] ? panic+0xa7/0x16f
      [<ffffffff8152b534>] ? oops_end+0xe4/0x100
      [<ffffffff8104a00b>] ? no_context+0xfb/0x260
      [<ffffffff81084db2>] ? del_timer_sync+0x22/0x30
      [<ffffffff8104a295>] ? __bad_area_nosemaphore+0x125/0x1e0
      [<ffffffff81084240>] ? process_timeout+0x0/0x10
      [<ffffffff8104a363>] ? bad_area_nosemaphore+0x13/0x20
      [<ffffffff8104aabf>] ? __do_page_fault+0x31f/0x480
      [<ffffffff81065df0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0752675>] ? free_msg+0x55/0x70 [mlx5_core]
      [<ffffffffa0753434>] ? cmd_exec+0x124/0x840 [mlx5_core]
      [<ffffffff8105a924>] ? find_busiest_group+0x244/0x9f0
      [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
      [<ffffffff8152a815>] ? page_fault+0x25/0x30
      [<ffffffffa024da25>] ? cm_alloc_msg+0x35/0xc0 [ib_cm]
      [<ffffffffa024e821>] ? ib_send_cm_dreq+0xb1/0x1e0 [ib_cm]
      [<ffffffffa024f836>] ? cm_destroy_id+0x176/0x320 [ib_cm]
      [<ffffffffa024fb00>] ? ib_destroy_cm_id+0x10/0x20 [ib_cm]
      [<ffffffffa034f527>] ? ipoib_cm_free_rx_reap_list+0xa7/0x110 [ib_ipoib]
      [<ffffffffa034f590>] ? ipoib_cm_rx_reap+0x0/0x20 [ib_ipoib]
      [<ffffffffa034f5a5>] ? ipoib_cm_rx_reap+0x15/0x20 [ib_ipoib]
      [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
      [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
      [<ffffffff81094bb0>] ? worker_thread+0x0/0x2a0
      [<ffffffff8109aef6>] ? kthread+0x96/0xa0
      [<ffffffff8100c20a>] ? child_rip+0xa/0x20
      [<ffffffff8109ae60>] ? kthread+0x0/0xa0
      [<ffffffff8100c200>] ? child_rip+0x0/0x20
    
    Fixes: a977049d
    
     ("[PATCH] IB: Add the kernel CM implementation")
    Signed-off-by: default avatarMark Bloch <markb@mellanox.com>
    Signed-off-by: default avatarErez Shitrit <erezsh@mellanox.com>
    Reviewed-by: default avatarMaor Gottlieb <maorg@mellanox.com>
    Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
    Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
    9db0ff53