1. 10 Oct, 2008 15 commits
  2. 01 Oct, 2008 3 commits
    • Chandra Seetharaman's avatar
      dm mpath: add missing path switching locking · 7253a334
      Chandra Seetharaman authored
      Moving the path activation to workqueue along with scsi_dh patches introduced
      a race. It is due to the fact that the current_pgpath (in the multipath data
      structure) can be modified if changes happen in any of the paths leading to
      the lun. If the changes lead to current_pgpath being set to NULL, then it
      leads to the invalid access which results in the panic below.
      This patch fixes that by storing the pgpath to activate in the multipath data
      structure and properly protecting it.
      Note that if activate_path is called twice in succession with different pgpath,
      with the second one being called before the first one is done, then activate
      path will be called twice for the second pgpath, which is fine.
      Unable to handle kernel paging request for data at address 0x00000020
      Faulting instruction address: 0xd000000000aa1844
      cpu 0x1: Vector: 300 (Data Access) at [c00000006b987a80]
          pc: d000000000aa1844: .activate_path+0x30/0x218 [dm_multipath]
          lr: c000000000087a2c: .run_workqueue+0x114/0x204
          sp: c00000006b987d00
         msr: 8000000000009032
         dar: 20
       dsisr: 40000000
        current = 0xc0000000676bb3f0
        paca    = 0xc0000000006f3680
          pid   = 2528, comm = kmpath_handlerd
      enter ? for help
      [c00000006b987da0] c000000000087a2c .run_workqueue+0x114/0x204
      [c00000006b987e40] c000000000088b58 .worker_thread+0x120/0x144
      [c00000006b987f00] c00000000008ca70 .kthread+0x78/0xc4
      [c00000006b987f90] c000000000027cc8 .kernel_thread+0x4c/0x68
      Signed-off-by: default avatarChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
    • Mikulas Patocka's avatar
      dm: cope with access beyond end of device in dm_merge_bvec · b01cd5ac
      Mikulas Patocka authored
      If for any reason dm_merge_bvec() is given an offset beyond the end of the
      device, avoid an oops and always allow one page to be added to an empty bio.
      We'll reject the I/O later after the bio is submitted.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
    • Mikulas Patocka's avatar
      dm: always allow one page in dm_merge_bvec · 5037108a
      Mikulas Patocka authored
      Some callers assume they can always add at least one page to an empty bio,
      so dm_merge_bvec should not return 0 in this case: we'll reject the I/O
      later after the bio is submitted.
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
  3. 19 Sep, 2008 1 commit
    • NeilBrown's avatar
      md: Don't wait UNINTERRUPTIBLE for other resync to finish · 9744197c
      NeilBrown authored
      When two md arrays share some block device (e.g each uses different
      partitions on the one device), a resync of one array will wait for
      the resync on the other to finish.
      This can be a long time and as it currently waits TASK_UNINTERRUPTIBLE,
      the softlockup code notices and complains.
      So use TASK_INTERRUPTIBLE instead and make sure to flush signals
      before calling schedule.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  4. 01 Sep, 2008 2 commits
    • NeilBrown's avatar
      Fix problem with waiting while holding rcu read lock in md/bitmap.c · b2d2c4ce
      NeilBrown authored
      A recent patch to protect the rdev list with rcu locking leaves us
      with a problem because we can sleep on memalloc while holding the
      rcu lock.
      The rcu lock is only needed while walking the linked list as
      uninteresting devices (failed or spares) can be removed at any time.
      So only take the rcu lock while actually walking the linked list.
      Take a refcount on the rdev during the time when we drop the lock
      and do the memalloc to start IO.
      When we return to the locked code, all the interesting devices
      on the list will not have moved, so we can simply use
      list_for_each_continue_rcu to pick up where we left off.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      Remove invalidate_partition call from do_md_stop. · 271f5a9b
      NeilBrown authored
      When stopping an md array, or just switching to read-only, we
      currently call invalidate_partition while holding the mddev lock.
      The main reason for this is probably to ensure all dirty buffers
      are flushed (invalidate_partition calls fsync_bdev).
      However if any dirty buffers are found, it will almost certainly cause
      a deadlock as starting writeout will require an update to the
      superblock, and performing that updates requires taking the mddev
      lock - which is already held.
      This deadlock can be demonstrated by running "reboot -f -n" with
      a root filesystem on md/raid, and some dirty buffers in memory.
      All other calls to stop an array should already happen after a flush.
      The normal sequence is to stop using the array (e.g. umount) which
      will cause __blkdev_put to call sync_blockdev.  Then open the
      array and issue the STOP_ARRAY ioctl while the buffers are all still
      So this invalidate_partition is normally a no-op, except for one case
      where it will cause a deadlock.
      So remove it.
      This patch possibly addresses the regression recored in
      though it isn't yet clear how it ever worked.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  5. 07 Aug, 2008 1 commit
  6. 05 Aug, 2008 6 commits
    • NeilBrown's avatar
      Allow raid10 resync to happening in larger chunks. · 0310fa21
      NeilBrown authored
      The raid10 resync/recovery code currently limits the amount of
      in-flight resync IO to 2Meg.  This was copied from raid1 where
      it seems quite adequate.  However for raid10, some layouts require
      a bit of seeking to perform a resync, and allowing a larger buffer
      size means that the seeking can be significantly reduced.
      There is probably no real need to limit the amount of in-flight
      IO at all.  Any shortage of memory will naturally reduce the
      amount of buffer space available down to a set minimum, and any
      concurrent normal IO will quickly cause resync IO to back off.
      The only problem would be that normal IO has to wait for all resync IO
      to finish, so a very large amount of resync IO could cause unpleasant
      latency when normal IO starts up.
      So: increase RESYNC_DEPTH to allow 32Meg of buffer (if memory is
      available) which seems to be a good amount.  Also reduce the amount
      of memory reserved as there is no need to keep 2Meg just for resync if
      memory is tight.
      Thanks to Keld for the suggestion.
      Cc: Keld Jørn Simonsen <keld@dkuug.dk>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      Allow faulty devices to be removed from a readonly array. · c89a8eee
      NeilBrown authored
      Removing faulty devices from an array is a two stage process.
      First the device is moved from being a part of the active array
      to being similar to a spare device.  Then it can be removed
      by a request from user space.
      The first step is currently not performed for read-only arrays,
      so the second step can never succeed.
      So allow readonly arrays to remove failed devices (which aren't
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      Don't let a blocked_rdev interfere with read request in raid5/6 · ac4090d2
      NeilBrown authored
      When we have externally managed metadata, we need to mark a failed
      device as 'Blocked' and not allow any writes until that device
      have been marked as faulty in the metadata and the Blocked flag has
      been removed.
      However it is perfectly OK to allow read requests when there is a
      Blocked device, and with a readonly array, there may not be any
      metadata-handler watching for blocked devices.
      So in raid5/raid6 only allow a Blocked device to interfere with
      Write request or resync.  Read requests go through untouched.
      raid1 and raid10 already differentiate between read and write
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      Fail safely when trying to grow an array with a write-intent bitmap. · dba034ee
      NeilBrown authored
      We cannot currently change the size of a write-intent bitmap.
      So if we change the size of an array which has such a bitmap, it
      tries to set bits beyond the end of the bitmap.
      For now, simply reject any request to change the size of an array
      which has a bitmap.  mdadm can remove the bitmap and add a new one
      after the array has changed size.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      Restore force switch of md array to readonly at reboot time. · 2b25000b
      NeilBrown authored
      A recent patch allowed do_md_stop to know whether it was being called
      via an ioctl or not, and thus where to allow for an extra open file
      descriptor when checking if it is in use.
      This broke then switch to readonly performed by the shutdown notifier,
      which needs to work even when the array is still (apparently) active
      (as md doesn't get told when the filesystem becomes readonly).
      So restore this feature by pretending that there can be lots of
      file descriptors open, but we still want do_md_stop to switch to
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
    • NeilBrown's avatar
      Make writes to md/safe_mode_delay immediately effective. · 19052c0e
      NeilBrown authored
      If we reduce the 'safe_mode_delay', it could still wait for the old
      delay to completely expire before doing anything about safe_mode.
      Thus the effect if the change is delayed.
      To make the effect more immediate, run the timeout function
      immediately if the delay was reduced.  This may cause it to run
      slightly earlier that required, but that is the safer option.
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
  7. 01 Aug, 2008 3 commits
  8. 29 Jul, 2008 2 commits
  9. 26 Jul, 2008 1 commit
  10. 23 Jul, 2008 3 commits
  11. 21 Jul, 2008 3 commits