1. 05 Mar, 2014 3 commits
    • Joe Thornber's avatar
      dm thin: fix deadlock in __requeue_bio_list · 18adc577
      Joe Thornber authored
      
      
      The spin lock in requeue_io() was held for too long, allowing deadlock.
      Don't worry, due to other issues addressed in the following "dm thin:
      fix noflush suspend IO queueing" commit, this code was never called.
      
      Fix this by taking the spin lock for a much shorter period of time.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      18adc577
    • Joe Thornber's avatar
      dm thin: fix out of data space handling · 3e1a0699
      Joe Thornber authored
      
      
      Ideally a thin pool would never run out of data space; the low water
      mark would trigger userland to extend the pool before we completely run
      out of space.  However, many small random IOs to unprovisioned space can
      consume data space at an alarming rate.  Adjust your low water mark if
      you're frequently seeing "out-of-data-space" mode.
      
      Before this fix, if data space ran out the pool would be put in
      PM_READ_ONLY mode which also aborted the pool's current metadata
      transaction (data loss for any changes in the transaction).  This had a
      side-effect of needlessly compromising data consistency.  And retry of
      queued unserviceable bios, once the data pool was resized, could
      initiate changes to potentially inconsistent pool metadata.
      
      Now when the pool's data space is exhausted transition to a new pool
      mode (PM_OUT_OF_DATA_SPACE) that allows metadata to be changed but data
      may not be allocated.  This allows users to remove thin volumes or
      discard data to recover data space.
      
      The pool is no longer put in PM_READ_ONLY mode in response to the pool
      running out of data space.  And PM_READ_ONLY mode no longer aborts the
      pool's current metadata transaction.  Also, set_pool_mode() will now
      notify userspace when the pool mode is changed.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      3e1a0699
    • Mike Snitzer's avatar
      dm thin: ensure user takes action to validate data and metadata consistency · 07f2b6e0
      Mike Snitzer authored
      
      
      If a thin metadata operation fails the current transaction will abort,
      whereby causing potential for IO layers up the stack (e.g. filesystems)
      to have data loss.  As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
      thin metadata's superblock which:
      1) requires the user verify the thin metadata is consistent (e.g. use
         thin_check, etc)
      2) suggests the user verify the thin data is consistent (e.g. use fsck)
      
      The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
      to run thin_repair.
      
      On metadata operation failure: abort current metadata transaction, set
      pool in read-only mode, and now set the needs_check flag.
      
      As part of this change, constraints are introduced or relaxed:
      * don't allow a pool to transition to write mode if needs_check is set
      * don't allow data or metadata space to be resized if needs_check is set
      * if a thin pool's metadata space is exhausted: the kernel will now
        force the user to take the pool offline for repair before the kernel
        will allow the metadata space to be extended.
      
      Also, update Documentation to include information about when the thin
      provisioning target commits metadata, how it handles metadata failures
      and running out of space.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      07f2b6e0
  2. 04 Mar, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: synchronize the pool mode during suspend · cdc2b415
      Mike Snitzer authored
      Commit b5330655
      
       ("dm thin: handle metadata failures more consistently")
      increased potential for the pool's mode to be changed in response to
      metadata operation failures.
      
      When the pool mode is changed it isn't synchronized with the mode in
      pool_features stored in the target's context (ti->private) that is used
      as the basis for (re)establishing the pool mode during resume via
      bind_control_target.
      
      It is important that we synchronize the pool mode when it is changed
      otherwise the pool may experience and unexpected mode transition on the
      next resume (especially if there was no new table load).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      cdc2b415
  3. 27 Feb, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: allow metadata space larger than supported to go unused · 7d48935e
      Mike Snitzer authored
      
      
      It was always intended that a user could provide a thin metadata device
      that is larger than the max supported by the on-disk format.  The extra
      space would just go unused.
      
      Unfortunately that never worked.  If the user attempted to use a larger
      metadata device on creation they would get an error like the following:
      
       device-mapper: space map common: space map too large
       device-mapper: transaction manager: couldn't create metadata space map
       device-mapper: thin metadata: tm_create_with_sm failed
       device-mapper: table: 252:17: thin-pool: Error creating metadata object
       device-mapper: ioctl: error adding target to table
      
      Fix this by allowing the initial metadata space map creation to cap its
      size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
      get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
      THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
      THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).
      
      Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
      the sizeof the disk_bitmap_header.  So the supported maximum metadata
      size is a bit smaller (reduced from 33423360 to 33292800 sectors).
      
      Lastly, remove the "excess space will not be used" warning message from
      get_metadata_dev_size(); it resulted in printing the warning multiple
      times.  Factor out warn_if_metadata_device_too_big(), call it from
      pool_ctr() and maybe_resize_metadata_dev().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      7d48935e
  4. 24 Feb, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: fix the error path for the thin device constructor · 1acacc07
      Mike Snitzer authored
      
      
      dm_pool_close_thin_device() must be called if dm_set_target_max_io_len()
      fails in thin_ctr().  Otherwise __pool_destroy() will fail because the
      pool will still have an open thin device:
      
       device-mapper: thin metadata: attempt to close pmd when 1 device(s) are still open
       device-mapper: thin: __pool_destroy: dm_pool_metadata_close() failed.
      
      Also, must establish error code if failing thin_ctr() because the pool
      is in fail_io mode.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      1acacc07
  5. 17 Feb, 2014 1 commit
  6. 16 Jan, 2014 1 commit
  7. 07 Jan, 2014 13 commits
    • Mike Snitzer's avatar
      dm thin: fix set_pool_mode exposed pool operation races · 8b64e881
      Mike Snitzer authored
      
      
      The pool mode must not be switched until after the corresponding pool
      process_* methods have been established.  Otherwise, because
      set_pool_mode() isn't interlocked with the IO path for performance
      reasons, the IO path can end up executing process_* operations that
      don't match the mode.  This patch eliminates problems like the following
      (as seen on really fast PCIe SSD storage when transitioning the pool's
      mode from PM_READ_ONLY to PM_WRITE):
      
      kernel: device-mapper: thin: 253:2: reached low water mark for data device: sending event.
      kernel: device-mapper: thin: 253:2: no free data space available.
      kernel: device-mapper: thin: 253:2: switching pool to read-only mode
      kernel: device-mapper: thin: 253:2: switching pool to write mode
      kernel: ------------[ cut here ]------------
      kernel: WARNING: CPU: 11 PID: 7564 at drivers/md/dm-thin.c:995 handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]()
      ...
      kernel: Workqueue: dm-thin do_worker [dm_thin_pool]
      kernel: 00000000000003e3 ffff880308831cc8 ffffffff8152ebcb 00000000000003e3
      kernel: 0000000000000000 ffff880308831d08 ffffffff8104c46c ffff88032502a800
      kernel: ffff880036409000 ffff88030ec7ce00 0000000000000001 00000000ffffffc3
      kernel: Call Trace:
      kernel: [<ffffffff8152ebcb>] dump_stack+0x49/0x5e
      kernel: [<ffffffff8104c46c>] warn_slowpath_common+0x8c/0xc0
      kernel: [<ffffffff8104c4ba>] warn_slowpath_null+0x1a/0x20
      kernel: [<ffffffffa001e2c6>] handle_unserviceable_bio+0x146/0x160 [dm_thin_pool]
      kernel: [<ffffffffa001f276>] process_bio_read_only+0x136/0x180 [dm_thin_pool]
      kernel: [<ffffffffa0020b75>] process_deferred_bios+0xc5/0x230 [dm_thin_pool]
      kernel: [<ffffffffa0020d31>] do_worker+0x51/0x60 [dm_thin_pool]
      kernel: [<ffffffff81067823>] process_one_work+0x183/0x490
      kernel: [<ffffffff81068c70>] worker_thread+0x120/0x3a0
      kernel: [<ffffffff81068b50>] ? manage_workers+0x160/0x160
      kernel: [<ffffffff8106e86e>] kthread+0xce/0xf0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: [<ffffffff8153b3ec>] ret_from_fork+0x7c/0xb0
      kernel: [<ffffffff8106e7a0>] ? kthread_freezable_should_stop+0x70/0x70
      kernel: ---[ end trace 3f00528e08ffa55c ]---
      kernel: device-mapper: thin: pool mode is PM_WRITE not PM_READ_ONLY like expected!?
      
      dm-thin.c:995 was the WARN_ON_ONCE(get_pool_mode(pool) != PM_READ_ONLY);
      at the top of handle_unserviceable_bio().  And as the additional
      debugging I had conveys: the pool mode was _not_ PM_READ_ONLY like
      expected, it was already PM_WRITE, yet pool->process_bio was still set
      to process_bio_read_only().
      
      Also, while fixing this up, reduce logging of redundant pool mode
      transitions by checking new_mode is different from old_mode.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      8b64e881
    • Mike Snitzer's avatar
      dm thin: eliminate the no_free_space flag · 6d16202b
      Mike Snitzer authored
      
      
      The pool's error_if_no_space flag can easily serve the same purpose that
      no_free_space did, namely: control whether handle_unserviceable_bio()
      will error a bio or requeue it.
      
      This is cleaner since error_if_no_space is established when the pool's
      features are processed during table load.  So it avoids managing the
      no_free_space flag by taking the pool's spinlock.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      6d16202b
    • Mike Snitzer's avatar
      dm thin: add error_if_no_space feature · 787a996c
      Mike Snitzer authored
      
      
      If the pool runs out of data or metadata space, the pool can either
      queue or error the IO destined to the data device.  The default is to
      queue the IO until more space is added.
      
      An admin may now configure the pool to error IO when no space is
      available by setting the 'error_if_no_space' feature when loading the
      thin-pool table.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      787a996c
    • Mike Snitzer's avatar
      dm thin: requeue bios to DM core if no_free_space and in read-only mode · 8c0f0e8c
      Mike Snitzer authored
      
      
      Now that we switch the pool to read-only mode when the data device runs
      out of space it causes active writers to get IO errors once we resume
      after resizing the data device.
      
      If no_free_space is set, save bios to the 'retry_on_resume_list' and
      requeue them on resume (once the data or metadata device may have been
      resized).
      
      With this patch the resize_io test passes again (on slower storage):
       dmtest run --suite thin-provisioning -n /resize_io/
      
      Later patches fix some subtle races associated with the pool mode
      transitions done as part of the pool's -ENOSPC handling.  These races
      are exposed on fast storage (e.g. PCIe SSD).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      8c0f0e8c
    • Mike Snitzer's avatar
      dm thin: cleanup and improve no space handling · 399caddf
      Mike Snitzer authored
      
      
      Factor out_of_data_space() out of alloc_data_block().  Eliminate the use
      of 'no_free_space' as a latch in alloc_data_block() -- this is no longer
      needed now that we switch to read-only mode when we run out of data or
      metadata space.  In a later patch, the 'no_free_space' flag will be
      eliminated entirely (in favor of checking metadata rather than relying
      on a transient flag).
      
      Move no metdata space handling into metdata_operation_failed().  Set
      no_free_space when metadata space is exhausted too.  This is useful,
      because it offers consistency, for the following patch that will requeue
      data IOs if no_free_space.
      
      Also, rename no_space() to retry_bios_on_resume().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      399caddf
    • Mike Snitzer's avatar
    • Joe Thornber's avatar
      dm thin: handle metadata failures more consistently · b5330655
      Joe Thornber authored
      
      
      Introduce metadata_operation_failed() wrappers, around set_pool_mode(),
      to assist with improving the consistency of how metadata failures are
      handled.  Logging is improved and metadata operation failures trigger
      read-only mode immediately.
      
      Also, eliminate redundant set_pool_mode() calls in the two
      alloc_data_block() caller's error paths.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      b5330655
    • Joe Thornber's avatar
      dm thin: factor out check_low_water_mark and use bools · 88a6621b
      Joe Thornber authored
      
      
      Factor check_low_water_mark() out of alloc_data_block().
      Change a couple unsigned flags in the pool structure to bool.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      88a6621b
    • Mike Snitzer's avatar
      dm thin: add mappings to end of prepared_* lists · daec338b
      Mike Snitzer authored
      
      
      Mappings could be processed in descending logical block order,
      particularly if buffered IO is used.  This could adversely affect the
      latency of IO processing.  Fix this by adding mappings to the end of the
      'prepared_mappings' and 'prepared_discards' lists.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      daec338b
    • Joe Thornber's avatar
    • Mike Snitzer's avatar
      dm thin: use bool rather than unsigned for flags in structures · 7f214665
      Mike Snitzer authored
      
      
      Also, move 'err' member in dm_thin_new_mapping structure to eliminate 4
      byte hole (reduces size from 88 bytes to 80).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      7f214665
    • Joe Thornber's avatar
      dm thin: fix discard support to a previously shared block · 19fa1a67
      Joe Thornber authored
      If a snapshot is created and later deleted the origin dm_thin_device's
      snapshotted_time will have been updated to reflect the snapshot's
      creation time.  The 'shared' flag in the dm_thin_lookup_result struct
      returned from dm_thin_find_block() is an approximation based on
      snapshotted_time -- this is done to avoid 0(n), or worse, time
      complexity.  In this case, the shared flag would be true.
      
      But because the 'shared' flag reflects an approximation a block can be
      incorrectly assumed to be shared (e.g. false positive for 'shared'
      because the snapshot no longer exists).  This could result in discards
      issued to a thin device not being passed down to the pool's underlying
      data device.
      
      To fix this we double check that a thin block is really still in-use
      after a mapping is removed using dm_pool_block_is_used().  If the
      reference count for a block is now zero the discard is allowed to be
      passed down.
      
      Also add a 'definitely_not_shared' member to the dm_thin_new_mapping
      structure -- reflects that the 'shared' flag in the response from
      dm_thin_find_block() can only be held as definitive if false is
      returned.
      
      Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1043527
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      19fa1a67
    • Mike Snitzer's avatar
      dm thin: initialize dm_thin_new_mapping returned by get_next_mapping · 16961b04
      Mike Snitzer authored
      
      
      As additional members are added to the dm_thin_new_mapping structure
      care should be taken to make sure they get initialized before use.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      16961b04
  8. 10 Dec, 2013 5 commits
    • Joe Thornber's avatar
      dm thin: allow pool in read-only mode to transition to read-write mode · 9b7aaa64
      Joe Thornber authored
      
      
      A thin-pool may be in read-only mode because the pool's data or metadata
      space was exhausted.  To allow for recovery, by adding more space to the
      pool, we must allow a pool to transition from PM_READ_ONLY to PM_WRITE
      mode.  Otherwise, running out of space will render the pool permanently
      read-only.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      9b7aaa64
    • Joe Thornber's avatar
      dm thin: re-establish read-only state when switching to fail mode · 5383ef3a
      Joe Thornber authored
      
      
      If the thin-pool transitioned to fail mode and the thin-pool's table
      were reloaded for some reason: the new table's default pool mode would
      be read-write, though it will transition to fail mode during resume.
      
      When the pool mode transitions directly from PM_WRITE to PM_FAIL we need
      to re-establish the intermediate read-only state in both the metadata
      and persistent-data block manager (as is usually done with the normal
      pool mode transition sequence: PM_WRITE -> PM_READ_ONLY -> PM_FAIL).
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5383ef3a
    • Joe Thornber's avatar
      dm thin: always fallback the pool mode if commit fails · 020cc3b5
      Joe Thornber authored
      
      
      Rename commit_or_fallback() to commit().  Now all previous calls to
      commit() will trigger the pool mode to fallback if the commit fails.
      
      Also, check the error returned from commit() in alloc_data_block().
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      020cc3b5
    • Mike Snitzer's avatar
      dm thin: switch to read-only mode if metadata space is exhausted · 4a02b34e
      Mike Snitzer authored
      
      
      Switch the thin pool to read-only mode in alloc_data_block() if
      dm_pool_alloc_data_block() fails because the pool's metadata space is
      exhausted.
      
      Differentiate between data and metadata space in messages about no
      free space available.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      <snip ... these repeat for a _very_ long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: commit failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: thin: 253:4: reached low water mark for metadata device: sending event.
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org
      4a02b34e
    • Joe Thornber's avatar
      dm thin: switch to read only mode if a mapping insert fails · fafc7a81
      Joe Thornber authored
      
      
      Switch the thin pool to read-only mode when dm_thin_insert_block() fails
      since there is little reason to expect the cause of the failure to be
      resolved without further action by user space.
      
      This issue was noticed with the device-mapper-test-suite using:
      dmtest run --suite thin-provisioning -n /exhausting_metadata_space_causes_fail_mode/
      
      The quantity of errors logged in this case must be reduced.
      
      before patch:
      
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: dm_thin_insert_block() failed
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map metadata: unable to allocate new metadata block
      <snip ... these repeat for a long while ... >
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: space map common: dm_tm_shadow_block() failed
      device-mapper: thin: 253:4: no free metadata space available.
      device-mapper: thin: 253:4: switching pool to read-only mode
      
      after patch:
      
      device-mapper: space map metadata: unable to allocate new metadata block
      device-mapper: thin: 253:4: dm_thin_insert_block() failed: error = -28
      device-mapper: thin: 253:4: switching pool to read-only mode
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      fafc7a81
  9. 24 Nov, 2013 2 commits
    • Kent Overstreet's avatar
      block: Generic bio chaining · 196d38bc
      Kent Overstreet authored
      
      
      This adds a generic mechanism for chaining bio completions. This is
      going to be used for a bio_split() replacement, and it turns out to be
      very useful in a fair amount of driver code - a fair number of drivers
      were implementing this in their own roundabout ways, often painfully.
      
      Note that this means it's no longer to call bio_endio() more than once
      on the same bio! This can cause problems for drivers that save/restore
      bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
      - in all but the simplest cases they'd be better off just cloning the
      bio, and immutable biovecs is making bio cloning cheaper. But for now,
      we add a bio_endio_nodec() for these cases.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      196d38bc
    • Kent Overstreet's avatar
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet authored
      
      
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  10. 23 Sep, 2013 1 commit
    • Mike Snitzer's avatar
      dm thin: do not expose non-zero discard limits if discards disabled · b60ab990
      Mike Snitzer authored
      
      
      Fix issue where the block layer would stack the discard limits of the
      pool's data device even if the "ignore_discard" pool feature was
      specified.
      
      The pool and thin device(s) still had discards disabled because the
      QUEUE_FLAG_DISCARD request_queue flag wasn't set.  But to avoid user
      confusion when "ignore_discard" is used: both the pool device and the
      thin device(s) have zeroes for all discard limits.
      
      Also, always set discard_zeroes_data_unsupported in targets because they
      should never advertise the 'discard_zeroes_data' capability (even if the
      pool's data device supports it).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      b60ab990
  11. 06 Sep, 2013 3 commits
  12. 23 Aug, 2013 1 commit
  13. 19 May, 2013 1 commit
    • Alasdair G Kergon's avatar
      dm thin: fix metadata dev resize detection · 610bba8b
      Alasdair G Kergon authored
      Fix detection of the need to resize the dm thin metadata device.
      
      The code incorrectly tried to extend the metadata device when it
      didn't need to due to a merging error with patch 24347e95
      
       ("dm thin:
      detect metadata device resizing").
      
        device-mapper: transaction manager: couldn't open metadata space map
        device-mapper: thin metadata: tm_open_with_sm failed
        device-mapper: thin: aborting transaction failed
        device-mapper: thin: switching pool to failure mode
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      610bba8b
  14. 10 May, 2013 4 commits
  15. 20 Mar, 2013 2 commits
    • Joe Thornber's avatar
      dm thin: fix non power of two discard granularity calc · 58051b94
      Joe Thornber authored
      Fix a discard granularity calculation to work for non power of 2 block sizes.
      
      In order for thinp to passdown discard bios to the underlying data
      device, the data device must have a discard granularity that is a
      factor of the thinp block size.  Originally this check was done by
      using bitops since the block_size was known to be a power of two.
      
      Introduced by commit f13945d7
      
      
      ("dm thin: support a non power of 2 discard_granularity").
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      58051b94
    • Joe Thornber's avatar
      dm thin: fix discard corruption · f046f89a
      Joe Thornber authored
      
      
      Fix a bug in dm_btree_remove that could leave leaf values with incorrect
      reference counts.  The effect of this was that removal of a shared block
      could result in the space maps thinking the block was no longer used.
      More concretely, if you have a thin device and a snapshot of it, sending
      a discard to a shared region of the thin could corrupt the snapshot.
      
      Thinp uses a 2-level nested btree to store it's mappings.  This first
      level is indexed by thin device, and the second level by logical
      block.
      
      Often when we're removing an entry in this mapping tree we need to
      rebalance nodes, which can involve shadowing them, possibly creating a
      copy if the block is shared.  If we do create a copy then children of
      that node need to have their reference counts incremented.  In this
      way reference counts percolate down the tree as shared trees diverge.
      
      The rebalance functions were incrementing the children at the
      appropriate time, but they were always assuming the children were
      internal nodes.  This meant the leaf values (in our case packed
      block/flags entries) were not being incremented.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      f046f89a