Skip to content
  • James Smart's avatar
    nvme: validate controller state before rescheduling keep alive · 86880d64
    James Smart authored
    
    
    Delete operations are seeing NULL pointer references in call_timer_fn.
    Tracking these back, the timer appears to be the keep alive timer.
    
    nvme_keep_alive_work() which is tied to the timer that is cancelled
    by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't
    wait for it's completion. So nvme_stop_keep_alive() only stops a timer
    when it's pending. When a keep alive is in flight, there is no timer
    running and the nvme_stop_keep_alive() will have no affect on the keep
    alive io. Thus, if the io completes successfully, the keep alive timer
    will be rescheduled.   In the failure case, delete is called, the
    controller state is changed, the nvme_stop_keep_alive() is called while
    the io is outstanding, and the delete path continues on. The keep
    alive happens to successfully complete before the delete paths mark it
    as aborted as part of the queue termination, so the timer is restarted.
    The delete paths then tear down the controller, and later on the timer
    code fires and the timer entry is now corrupt.
    
    Fix by validating the controller state before rescheduling the keep
    alive. Testing with the fix has confirmed the condition above was hit.
    
    Signed-off-by: default avatarJames Smart <jsmart2021@gmail.com>
    Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
    Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
    86880d64