Skip to content
  • Filipe Manana's avatar
    Btrfs: fix chunk allocation regression leading to transaction abort · c152b63e
    Filipe Manana authored
    With commit 1b984508 ("Btrfs: fix find_free_dev_extent() malfunction
    in case device tree has hole") introduced in the kernel 4.1 merge window,
    we end up using part of a device hole for which there are already pending
    chunks or pinned chunks. Before that commit we didn't use the hole and
    would just move on to the next hole in the device.
    
    However when we adjust the start offset for the chunk allocation and we
    have pinned chunks, we set it blindly to the end offset of the pinned
    chunk we are currently processing, which is dangerous because we can
    have a pending chunk that has a start offset that matches the end offset
    of our pinned chunk - leading us to a case where we end up getting two
    pending chunks that start at the same physical device offset, which makes
    us later abort the current transaction with -EEXIST when finishing the
    chunk allocation at btrfs_create_pending_block_groups():
    
    [194737.659017] ------------[ cut here ]------------
    [194737.660192] WARNING: CPU: 15 PID: 31111 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x106 [btrfs]()
    [194737.662209] BTRFS: Transaction aborted (error -17)
    [194737.663175] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse
    [194737.674015] CPU: 15 PID: 31111 Comm: xfs_io Tainted: G        W       4.0.0-rc5-btrfs-next-9+ #2
    [194737.675986] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
    [194737.682999]  0000000000000009 ffff8800564c7a98 ffffffff8142fa46 ffffffff8108b6a2
    [194737.684540]  ffff8800564c7ae8 ffff8800564c7ad8 ffffffff81045ea5 ffff8800564c7b78
    [194737.686017]  ffffffffa0383aa7 00000000ffffffef ffff88000c7ba000 ffff8801a1f66f40
    [194737.687509] Call Trace:
    [194737.688068]  [<ffffffff8142fa46>] dump_stack+0x4f/0x7b
    [194737.689027]  [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad
    [194737.690095]  [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb
    [194737.691198]  [<ffffffffa0383aa7>] ? __btrfs_abort_transaction+0x52/0x106 [btrfs]
    [194737.693789]  [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48
    [194737.695065]  [<ffffffffa0383aa7>] __btrfs_abort_transaction+0x52/0x106 [btrfs]
    [194737.696806]  [<ffffffffa039a3bd>] btrfs_create_pending_block_groups+0x101/0x130 [btrfs]
    [194737.698683]  [<ffffffffa03aa433>] __btrfs_end_transaction+0x84/0x366 [btrfs]
    [194737.700329]  [<ffffffffa03aa725>] btrfs_end_transaction+0x10/0x12 [btrfs]
    [194737.701924]  [<ffffffffa0394b51>] btrfs_check_data_free_space+0x11f/0x27c [btrfs]
    [194737.703675]  [<ffffffffa03b8ba4>] __btrfs_buffered_write+0x16a/0x4c8 [btrfs]
    [194737.705417]  [<ffffffffa03bb502>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs]
    [194737.707058]  [<ffffffffa03bb511>] ? btrfs_file_write_iter+0x1a9/0x431 [btrfs]
    [194737.708560]  [<ffffffffa03bb68d>] btrfs_file_write_iter+0x325/0x431 [btrfs]
    [194737.710673]  [<ffffffff81067d85>] ? get_parent_ip+0xe/0x3e
    [194737.712076]  [<ffffffff811534c3>] new_sync_write+0x7c/0xa0
    [194737.713293]  [<ffffffff81153b58>] vfs_write+0xb2/0x117
    [194737.714443]  [<ffffffff81154424>] SyS_pwrite64+0x64/0x82
    [194737.715646]  [<ffffffff81435b32>] system_call_fastpath+0x12/0x17
    [194737.717175] ---[ end trace f2d5dc04e56d7e48 ]---
    [194737.718170] BTRFS: error (device sdc) in btrfs_create_pending_block_groups:9524: errno=-17 Object already exists
    
    The -EEXIST failure comes from btrfs_finish_chunk_alloc(), called by
    btrfs_create_pending_block_groups(), when it attempts to insert a
    duplicated device extent item via btrfs_alloc_dev_extent().
    
    This issue was reproducible with fstests generic/038 running in a loop for
    several hours (it's very hard to hit) and using MOUNT_OPTIONS="-o discard".
    Applying Jeff's recent patch titled "btrfs: add missing discards when
    unpinning extents with -o discard" makes the issue much easier to reproduce
    (usually within 4 to 5 hours), since it pins chunks for longer periods of
    time when an unused block group is deleted by the cleaner kthread.
    
    Fix this by making sure that we never adjust the start offset to a lower
    value than it currently has.
    
    Fixes: 1b984508
    
     ("Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole"
    Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
    Signed-off-by: default avatarChris Mason <clm@fb.com>
    c152b63e