Skip to content
  • Liu Bo's avatar
    Btrfs: fix leaf corruption after __btrfs_drop_extents · 0b43e04f
    Liu Bo authored
    
    
    Several reports about leaf corruption has been floating on the list, one of them
    points to __btrfs_drop_extents(), and we find that the leaf becomes corrupted
    after __btrfs_drop_extents(), it's really a rare case but it does exist.
    
    The problem turns out to be btrfs_next_leaf() called in __btrfs_drop_extents().
    
    So in btrfs_next_leaf(), we release the current path to re-search the last key of
    the leaf for locating next leaf, and we've taken it into account that there might
    be balance operations between leafs during this 'unlock and re-lock' dance, so
    we check the path again and advance it if there are now more items available.
    But things are a bit different if that last key happens to be removed and balance
    gets a bigger key as the last one, and btrfs_search_slot will return it with
    ret > 0, IOW, nothing change in this leaf except the new last key, then we think
    we're okay because there is no more item balanced in, fine, we thinks we can
    go to the next leaf.
    
    However, we should return that bigger key, otherwise we deserve leaf corruption,
    for example, in endio, skipping that key means that __btrfs_drop_extents() thinks
    it has dropped all extent matched the required range and finish_ordered_io can
    safely insert a new extent, but it actually doesn't and ends up a leaf
    corruption.
    
    One may be asking that why our locking on extent io tree doesn't work as
    expected, ie. it should avoid this kind of race situation.  But in
    __btrfs_drop_extents(), we don't always find extents which are included within
    our locking range, IOW, extents can start before our searching start, in this
    case locking on extent io tree doesn't protect us from the race.
    
    This takes the special case into account.
    
    Reviewed-by: default avatarFilipe Manana <fdmanana@gmail.com>
    Signed-off-by: default avatarLiu Bo <bo.li.liu@oracle.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    0b43e04f