• Mike Snitzer's avatar
    dm thin: handle running out of data space vs concurrent discard · 0b19825f
    Mike Snitzer authored
    commit a685557f upstream.
    
    Discards issued to a DM thin device can complete to userspace (via
    fstrim) _before_ the metadata changes associated with the discards is
    reflected in the thinp superblock (e.g. free blocks).  As such, if a
    user constructs a test that loops repeatedly over these steps, block
    allocation can fail due to discards not having completed yet:
    1) fill thin device via filesystem file
    2) remove file
    3) fstrim
    
    From initial report, here:
    https://www.redhat.com/archives/dm-devel/2018-April/msg00022.html
    
    
    
    "The root cause of this issue is that dm-thin will first remove
    mapping and increase corresponding blocks' reference count to prevent
    them from being reused before DISCARD bios get processed by the
    underlying layers. However. increasing blocks' reference count could
    also increase the nr_allocated_this_transaction in struct sm_disk
    which makes smd->old_ll.nr_allocated +
    smd->nr_allocated_this_transaction bigger than smd->old_ll.nr_blocks.
    In this case, alloc_data_block() will never commit metadata to reset
    the begin pointer of struct sm_disk, because sm_disk_get_nr_free()
    always return an underflow value."
    
    While there is room for improvement to the space-map accounting that
    thinp is making use of: the reality is this test is inherently racey and
    will result in the previous iteration's fstrim's discard(s) completing
    vs concurrent block allocation, via dd, in the next iteration of the
    loop.
    
    No amount of space map accounting improvements will be able to allow
    user's to use a block before a discard of that block has completed.
    
    So the best we can really do is allow DM thinp to gracefully handle such
    aggressive use of all the pool's data by degrading the pool into
    out-of-data-space (OODS) mode.  We _should_ get that behaviour already
    (if space map accounting didn't falsely cause alloc_data_block() to
    believe free space was available).. but short of that we handle the
    current reality that dm_pool_alloc_data_block() can return -ENOSPC.
    Reported-by: default avatarDennis Yang <dennisyang@qnap.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    0b19825f
dm-thin.c 110 KB