Skip to content
  • Vallish Vaidyeshwara's avatar
    dm thin: do not queue freed thin mapping for next stage processing · 00a0ea33
    Vallish Vaidyeshwara authored
    
    
    process_prepared_discard_passdown_pt1() should cleanup
    dm_thin_new_mapping in cases of error.
    
    dm_pool_inc_data_range() can fail trying to get a block reference:
    
    metadata operation 'dm_pool_inc_data_range' failed: error = -61
    
    When dm_pool_inc_data_range() fails, dm thin aborts current metadata
    transaction and marks pool as PM_READ_ONLY. Memory for thin mapping
    is released as well. However, current thin mapping will be queued
    onto next stage as part of queue_passdown_pt2() or passdown_endio().
    This dangling thin mapping memory when processed and accessed in
    next stage will lead to device mapper crashing.
    
    Code flow without fix:
    -> process_prepared_discard_passdown_pt1(m)
       -> dm_thin_remove_range()
       -> discard passdown
          --> passdown_endio(m) queues m onto next stage
       -> dm_pool_inc_data_range() fails, frees memory m
                but does not remove it from next stage queue
    
    -> process_prepared_discard_passdown_pt2(m)
       -> processes freed memory m and crashes
    
    One such stack:
    
    Call Trace:
    [<ffffffffa037a46f>] dm_cell_release_no_holder+0x2f/0x70 [dm_bio_prison]
    [<ffffffffa039b6dc>] cell_defer_no_holder+0x3c/0x80 [dm_thin_pool]
    [<ffffffffa039b88b>] process_prepared_discard_passdown_pt2+0x4b/0x90 [dm_thin_pool]
    [<ffffffffa0399611>] process_prepared+0x81/0xa0 [dm_thin_pool]
    [<ffffffffa039e735>] do_worker+0xc5/0x820 [dm_thin_pool]
    [<ffffffff8152bf54>] ? __schedule+0x244/0x680
    [<ffffffff81087e72>] ? pwq_activate_delayed_work+0x42/0xb0
    [<ffffffff81089f53>] process_one_work+0x153/0x3f0
    [<ffffffff8108a71b>] worker_thread+0x12b/0x4b0
    [<ffffffff8108a5f0>] ? rescuer_thread+0x350/0x350
    [<ffffffff8108fd6a>] kthread+0xca/0xe0
    [<ffffffff8108fca0>] ? kthread_park+0x60/0x60
    [<ffffffff81530b45>] ret_from_fork+0x25/0x30
    
    The fix is to first take the block ref count for discarded block and
    then do a passdown discard of this block. If block ref count fails,
    then bail out aborting current metadata transaction, mark pool as
    PM_READ_ONLY and also free current thin mapping memory (existing error
    handling code) without queueing this thin mapping onto next stage of
    processing. If block ref count succeeds, then passdown discard of this
    block. Discard callback of passdown_endio() will queue this thin mapping
    onto next stage of processing.
    
    Code flow with fix:
    -> process_prepared_discard_passdown_pt1(m)
       -> dm_thin_remove_range()
       -> dm_pool_inc_data_range()
          --> if fails, free memory m and bail out
       -> discard passdown
          --> passdown_endio(m) queues m onto next stage
    
    Cc: stable <stable@vger.kernel.org> # v4.9+
    Reviewed-by: default avatarEduardo Valentin <eduval@amazon.com>
    Reviewed-by: default avatarCristian Gafton <gafton@amazon.com>
    Reviewed-by: default avatarAnchal Agarwal <anchalag@amazon.com>
    Signed-off-by: default avatarVallish Vaidyeshwara <vallish@amazon.com>
    Reviewed-by: default avatarJoe Thornber <ejt@redhat.com>
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    00a0ea33