1. 13 Nov, 2014 2 commits
  2. 10 Nov, 2014 14 commits
  3. 04 Nov, 2014 1 commit
  4. 01 Aug, 2014 3 commits
  5. 11 Jun, 2014 1 commit
    • Lukas Czerner's avatar
      dm thin: update discard_granularity to reflect the thin-pool blocksize · 09869de5
      Lukas Czerner authored
      DM thinp already checks whether the discard_granularity of the data
      device is a factor of the thin-pool block size.  But when using the
      dm-thin-pool's discard passdown support, DM thinp was not selecting the
      max of the underlying data device's discard_granularity and the
      thin-pool's block size.
      Update set_discard_limits() to set discard_granularity to the max of
      these values.  This enables blkdev_issue_discard() to properly align the
      discards that are sent to the DM thin device on a full block boundary.
      As such each discard will now cover an entire DM thin-pool block and the
      block will be reclaimed.
      Reported-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: default avatarLukas Czerner <lczerner@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
  6. 03 Jun, 2014 2 commits
  7. 20 May, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: add 'no_space_timeout' dm-thin-pool module param · 80c57893
      Mike Snitzer authored
      Commit 85ad643b
       ("dm thin: add timeout to stop out-of-data-space mode
      holding IO forever") introduced a fixed 60 second timeout.  Users may
      want to either disable or modify this timeout.
      Allow the out-of-data-space timeout to be configured using the
      'no_space_timeout' dm-thin-pool module param.  Setting it to 0 will
      disable the timeout, resulting in IO being queued until more data space
      is added to the thin-pool.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.14+
  8. 14 May, 2014 2 commits
  9. 29 Apr, 2014 1 commit
  10. 08 Apr, 2014 2 commits
    • Joe Thornber's avatar
      dm thin: fix rcu_read_lock being held in code that can sleep · b10ebd34
      Joe Thornber authored
      Commit c140e1c4
       ("dm thin: use per thin device deferred bio lists")
      introduced the use of an rculist for all active thin devices.  The use
      of rcu_read_lock() in process_deferred_bios() can result in a BUG if a
      dm_bio_prison_cell must be allocated as a side-effect of bio_detain():
       BUG: sleeping function called from invalid context at mm/mempool.c:203
       in_atomic(): 1, irqs_disabled(): 0, pid: 6, name: kworker/u8:0
       3 locks held by kworker/u8:0/6:
         #0:  ("dm-" "thin"){.+.+..}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
         #1:  ((&pool->worker)){+.+...}, at: [<ffffffff8106be42>] process_one_work+0x192/0x550
         #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff816360b5>] do_worker+0x5/0x4d0
      We can't process deferred bios with the rcu lock held, since
      dm_bio_prison_cell allocation may block if the bio-prison's cell mempool
      is exhausted.
      To fix:
      - Introduce a refcount and completion field to each thin_c
      - Add thin_get/put methods for adjusting the refcount.  If the refcount
        hits zero then the completion is triggered.
      - Initialise refcount to 1 when creating thin_c
      - When iterating the active_thins list we thin_get() whilst the rcu
        lock is held.
      - After the rcu lock is dropped we process the deferred bios for that
      - When destroying a thin_c we thin_put() and then wait for the
        completion -- to avoid a race between the worker thread iterating
        from that thin_c and destroying the thin_c.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Joe Thornber's avatar
      dm thin: irqsave must always be used with the pool->lock spinlock · 5e3283e2
      Joe Thornber authored
      Commit c140e1c4
       ("dm thin: use per thin device deferred bio lists")
      incorrectly stopped disabling irqs when taking the pool's spinlock.
      Irqs must be disabled when taking the pool's spinlock otherwise a thread
      could spin_lock(), then get interrupted to service thin_endio() in
      interrupt context, which would then deadlock in spin_lock_irqsave().
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
  11. 04 Apr, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: sort the per thin deferred bios using an rb_tree · 67324ea1
      Mike Snitzer authored
      A thin-pool will allocate blocks using FIFO order for all thin devices
      which share the thin-pool.  Because of this simplistic allocation the
      thin-pool's space can become fragmented quite easily; especially when
      multiple threads are requesting blocks in parallel.
      Sort each thin device's deferred_bio_list based on logical sector to
      help reduce fragmentation of the thin-pool's ondisk layout.
      The following tables illustrate the realized gains/potential offered by
      sorting each thin device's deferred_bio_list.  An "io size"-sized random
      read of the device would result in "seeks/io" fragments being read, with
      an average "distance/seek" between each fragment.
      Data was written to a single thin device using multiple threads via
      iozone (8 threads, 64K for both the block_size and io_size).
           io size   seeks/io distance/seek
                4k    0.000   0b
               16k    0.013   11m
               64k    0.065   11m
              256k    0.274   10m
                1m    1.109   10m
                4m    4.411   10m
               16m    17.097  11m
               64m    60.055  13m
              256m    148.798 25m
                1g    809.929 21m
           io size   seeks/io distance/seek
                4k    0.000   0b
               16k    0.000   1g
               64k    0.001   1g
              256k    0.003   1g
                1m    0.011   1g
                4m    0.045   1g
               16m    0.181   1g
               64m    0.747   1011m
              256m    3.299   1g
                1g    14.373  1g
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
  12. 31 Mar, 2014 2 commits
  13. 28 Mar, 2014 1 commit
  14. 05 Mar, 2014 4 commits
    • Joe Thornber's avatar
      dm thin: fix noflush suspend IO queueing · 738211f7
      Joe Thornber authored
      i) by the time DM core calls the postsuspend hook the dm_noflush flag
      has been cleared.  So the old thin_postsuspend did nothing.  We need to
      use the presuspend hook instead.
      ii) There was a race between bios leaving DM core and arriving in the
      deferred queue.
      thin_presuspend now sets a 'requeue' flag causing all bios destined for
      that thin to be requeued back to DM core.  Then it requeues all held IO,
      and all IO on the deferred queue (destined for that thin).  Finally
      postsuspend clears the 'requeue' flag.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Joe Thornber's avatar
      dm thin: fix deadlock in __requeue_bio_list · 18adc577
      Joe Thornber authored
      The spin lock in requeue_io() was held for too long, allowing deadlock.
      Don't worry, due to other issues addressed in the following "dm thin:
      fix noflush suspend IO queueing" commit, this code was never called.
      Fix this by taking the spin lock for a much shorter period of time.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Joe Thornber's avatar
      dm thin: fix out of data space handling · 3e1a0699
      Joe Thornber authored
      Ideally a thin pool would never run out of data space; the low water
      mark would trigger userland to extend the pool before we completely run
      out of space.  However, many small random IOs to unprovisioned space can
      consume data space at an alarming rate.  Adjust your low water mark if
      you're frequently seeing "out-of-data-space" mode.
      Before this fix, if data space ran out the pool would be put in
      PM_READ_ONLY mode which also aborted the pool's current metadata
      transaction (data loss for any changes in the transaction).  This had a
      side-effect of needlessly compromising data consistency.  And retry of
      queued unserviceable bios, once the data pool was resized, could
      initiate changes to potentially inconsistent pool metadata.
      Now when the pool's data space is exhausted transition to a new pool
      mode (PM_OUT_OF_DATA_SPACE) that allows metadata to be changed but data
      may not be allocated.  This allows users to remove thin volumes or
      discard data to recover data space.
      The pool is no longer put in PM_READ_ONLY mode in response to the pool
      running out of data space.  And PM_READ_ONLY mode no longer aborts the
      pool's current metadata transaction.  Also, set_pool_mode() will now
      notify userspace when the pool mode is changed.
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
    • Mike Snitzer's avatar
      dm thin: ensure user takes action to validate data and metadata consistency · 07f2b6e0
      Mike Snitzer authored
      If a thin metadata operation fails the current transaction will abort,
      whereby causing potential for IO layers up the stack (e.g. filesystems)
      to have data loss.  As such, set THIN_METADATA_NEEDS_CHECK_FLAG in the
      thin metadata's superblock which:
      1) requires the user verify the thin metadata is consistent (e.g. use
         thin_check, etc)
      2) suggests the user verify the thin data is consistent (e.g. use fsck)
      The only way to clear the superblock's THIN_METADATA_NEEDS_CHECK_FLAG is
      to run thin_repair.
      On metadata operation failure: abort current metadata transaction, set
      pool in read-only mode, and now set the needs_check flag.
      As part of this change, constraints are introduced or relaxed:
      * don't allow a pool to transition to write mode if needs_check is set
      * don't allow data or metadata space to be resized if needs_check is set
      * if a thin pool's metadata space is exhausted: the kernel will now
        force the user to take the pool offline for repair before the kernel
        will allow the metadata space to be extended.
      Also, update Documentation to include information about when the thin
      provisioning target commits metadata, how it handles metadata failures
      and running out of space.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
  15. 04 Mar, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: synchronize the pool mode during suspend · cdc2b415
      Mike Snitzer authored
      Commit b5330655
       ("dm thin: handle metadata failures more consistently")
      increased potential for the pool's mode to be changed in response to
      metadata operation failures.
      When the pool mode is changed it isn't synchronized with the mode in
      pool_features stored in the target's context (ti->private) that is used
      as the basis for (re)establishing the pool mode during resume via
      It is important that we synchronize the pool mode when it is changed
      otherwise the pool may experience and unexpected mode transition on the
      next resume (especially if there was no new table load).
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
  16. 27 Feb, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: allow metadata space larger than supported to go unused · 7d48935e
      Mike Snitzer authored
      It was always intended that a user could provide a thin metadata device
      that is larger than the max supported by the on-disk format.  The extra
      space would just go unused.
      Unfortunately that never worked.  If the user attempted to use a larger
      metadata device on creation they would get an error like the following:
       device-mapper: space map common: space map too large
       device-mapper: transaction manager: couldn't create metadata space map
       device-mapper: thin metadata: tm_create_with_sm failed
       device-mapper: table: 252:17: thin-pool: Error creating metadata object
       device-mapper: ioctl: error adding target to table
      Fix this by allowing the initial metadata space map creation to cap its
      size at the max number of blocks supported (DM_SM_METADATA_MAX_BLOCKS).
      get_metadata_dev_size() must also impose DM_SM_METADATA_MAX_BLOCKS (via
      THIN_METADATA_MAX_SECTORS), otherwise extending metadata would cap at
      THIN_METADATA_MAX_SECTORS_WARNING (which is larger than supported).
      Also, the calculation for THIN_METADATA_MAX_SECTORS didn't account for
      the sizeof the disk_bitmap_header.  So the supported maximum metadata
      size is a bit smaller (reduced from 33423360 to 33292800 sectors).
      Lastly, remove the "excess space will not be used" warning message from
      get_metadata_dev_size(); it resulted in printing the warning multiple
      times.  Factor out warn_if_metadata_device_too_big(), call it from
      pool_ctr() and maybe_resize_metadata_dev().
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
  17. 24 Feb, 2014 1 commit
    • Mike Snitzer's avatar
      dm thin: fix the error path for the thin device constructor · 1acacc07
      Mike Snitzer authored
      dm_pool_close_thin_device() must be called if dm_set_target_max_io_len()
      fails in thin_ctr().  Otherwise __pool_destroy() will fail because the
      pool will still have an open thin device:
       device-mapper: thin metadata: attempt to close pmd when 1 device(s) are still open
       device-mapper: thin: __pool_destroy: dm_pool_metadata_close() failed.
      Also, must establish error code if failing thin_ctr() because the pool
      is in fail_io mode.
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org