1. 17 Sep, 2012 4 commits
  2. 20 Jul, 2012 3 commits
    • Mikulas Patocka's avatar
      dm raid1: set discard_zeroes_data_unsupported · 7c8d3a42
      Mikulas Patocka authored
      We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
      the underlying disks support zero on discard.  So this patch sets
      ti->discard_zeroes_data_unsupported.
      
      For example, if the mirror is in the process of resynchronizing, it may
      happen that kcopyd reads a piece of data, then discard is sent on the
      same area and then kcopyd writes the piece of data to another leg.
      Consequently, the data is not zeroed.
      
      The flag was made available by commit 983c7db3
      
      
      (dm crypt: always disable discard_zeroes_data).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      7c8d3a42
    • Mikulas Patocka's avatar
      dm thin: do not send discards to shared blocks · 650d2a06
      Mikulas Patocka authored
      When process_discard receives a partial discard that doesn't cover a
      full block, it sends this discard down to that block. Unfortunately, the
      block can be shared and the discard would corrupt the other snapshots
      sharing this block.
      
      This patch detects block sharing and ends the discard with success when
      sending it to the shared block.
      
      The above change means that if the device supports discard it can't be
      guaranteed that a discard request zeroes data. Therefore, we set
      ti->discard_zeroes_data_unsupported.
      
      Thin target discard support with this bug arrived in commit
      104655fd
      
       (dm thin: support discards).
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      650d2a06
    • Mikulas Patocka's avatar
      dm raid1: fix crash with mirror recovery and discard · 751f188d
      Mikulas Patocka authored
      This patch fixes a crash when a discard request is sent during mirror
      recovery.
      
      Firstly, some background.  Generally, the following sequence happens during
      mirror synchronization:
      - function do_recovery is called
      - do_recovery calls dm_rh_recovery_prepare
      - dm_rh_recovery_prepare uses a semaphore to limit the number
        simultaneously recovered regions (by default the semaphore value is 1,
        so only one region at a time is recovered)
      - dm_rh_recovery_prepare calls __rh_recovery_prepare,
        __rh_recovery_prepare asks the log driver for the next region to
        recover. Then, it sets the region state to DM_RH_RECOVERING. If there
        are no pending I/Os on this region, the region is added to
        quiesced_regions list. If there are pending I/Os, the region is not
        added to any list. It is added to the quiesced_regions list later (by
        dm_rh_dec function) when all I/Os finish.
      - when the region is on quiesced_regions list, there are no I/Os in
        flight on this region. The region is popped from the list in
        dm_rh_recovery_start function. Then, a kcopyd job is started in the
        recover function.
      - when the kcopyd job finishes, recovery_complete is called. It calls
        dm_rh_recovery_end. dm_rh_recovery_end adds the region to
        recovered_regions or failed_recovered_regions list (depending on
        whether the copy operation was successful or not).
      
      The above mechanism assumes that if the region is in DM_RH_RECOVERING
      state, no new I/Os are started on this region. When I/O is started,
      dm_rh_inc_pending is called, which increases reg->pending count. When
      I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
      If the count is zero and the region was in DM_RH_RECOVERING state,
      dm_rh_dec adds it to the quiesced_regions list.
      
      Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
      in DM_RH_RECOVERING state, it could be added to quiesced_regions list
      multiple times or it could be added to this list when kcopyd is copying
      data (it is assumed that the region is not on any list while kcopyd does
      its jobs). This results in memory corruption and crash.
      
      There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
      do not belong to any region, so they are always added to the sync list
      in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
      requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
      requests. These bypasses avoid the crash possibility described above.
      
      These bypasses were improperly implemented for REQ_DISCARD when
      the mirror target gained discard support in commit
      5fc2ffea (dm raid1: support discard).
      
      In do_writes, REQ_DISCARD requests is always added to the sync queue and
      immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
      dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
      rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
      corruption described above.
      
      This patch changes it so that REQ_DISCARD requests follow the same path
      as REQ_FLUSH. This avoids the crash.
      
      Reference: https://bugzilla.redhat.com/837607
      
      Signed-off-by: default avatarMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      751f188d
  3. 19 Jul, 2012 6 commits
    • Ezequiel Garcia's avatar
      cx25821: Remove bad strcpy to read-only char* · 380e99fc
      Ezequiel Garcia authored
      
      
      The strcpy was being used to set the name of the board.  Since the
      destination char* was read-only and the name is set statically at
      compile time; this was both wrong and redundant.
      
      The type of char* is changed to const char* to prevent future errors.
      Reported-by: default avatarRadek Masin <radek@masin.eu>
      Signed-off-by: default avatarEzequiel Garcia <elezegarcia@gmail.com>
      [ Taking directly due to vacations   - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      380e99fc
    • Benjamin Tissoires's avatar
    • NeilBrown's avatar
      md/raid1: close some possible races on write errors during resync · 58e94ae1
      NeilBrown authored
      commit 4367af55
         md/raid1: clear bad-block record when write succeeds.
      
      Added a 'reschedule_retry' call possibility at the end of
      end_sync_write, but didn't add matching code at the end of
      sync_request_write.  So if the writes complete very quickly, or
      scheduling makes it seem that way, then we can miss rescheduling
      the request and the resync could hang.
      
      Also commit 73d5c38a
      
      
          md: avoid races when stopping resync.
      
      Fix a race condition in this same code in end_sync_write but didn't
      make the change in sync_request_write.
      
      This patch updates sync_request_write to fix both of those.
      Patch is suitable for 3.1 and later kernels.
      Reported-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Original-version-by: default avatarAlexander Lyakas <alex.bolshoy@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      58e94ae1
    • NeilBrown's avatar
      md: avoid crash when stopping md array races with closing other open fds. · a05b7ea0
      NeilBrown authored
      
      
      md will refuse to stop an array if any other fd (or mounted fs) is
      using it.
      When any fs is unmounted of when the last open fd is closed all
      pending IO will be flushed (e.g. sync_blockdev call in __blkdev_put)
      so there will be no pending IO to worry about when the array is
      stopped.
      
      However in order to send the STOP_ARRAY ioctl to stop the array one
      must first get and open fd on the block device.
      If some fd is being used to write to the block device and it is closed
      after mdadm open the block device, but before mdadm issues the
      STOP_ARRAY ioctl, then there will be no last-close on the md device so
      __blkdev_put will not call sync_blockdev.
      
      If this happens, then IO can still be in-flight while md tears down
      the array and bad things can happen (use-after-free and subsequent
      havoc).
      
      So in the case where do_md_stop is being called from an open file
      descriptor, call sync_block after taking the mutex to ensure there
      will be no new openers.
      
      This is needed when setting a read-write device to read-only too.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarmajianpeng <majianpeng@gmail.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      a05b7ea0
    • NeilBrown's avatar
      md: fix bug in handling of new_data_offset · 25f7fd47
      NeilBrown authored
      commit c6563a8c
      
      
          md: add possibility to change data-offset for devices.
      
      introduced a 'new_data_offset' attribute which should normally
      be the same as 'data_offset', but can be explicitly set to a different
      value to allow a reshape operation to move the data.
      
      Unfortunately when the 'data_offset' is explicitly set through
      sysfs, the new_data_offset is not also set, so the two would become
      out-of-sync incorrectly.
      
      One result of this is that trying to set the 'size' after the
      'data_offset' would fail because it is not permitted to set the size
      when the 'data_offset' and 'new_data_offset' are different - as that
      can be confusing.
      Consequently when mdadm tried to do this while assembling an IMSM
      array it would fail.
      
      This bug was introduced in 3.5-rc1.
      Reported-by: default avatarBrian Downing <bdowning@lavos.net>
      Bisected-by: default avatarBrian Downing <bdowning@lavos.net>
      Tested-by: default avatarBrian Downing <bdowning@lavos.net>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      25f7fd47
    • Linus Torvalds's avatar
      Make wait_for_device_probe() also do scsi_complete_async_scans() · eea03c20
      Linus Torvalds authored
      Commit a7a20d10 ("sd: limit the scope of the async probe domain")
      make the SCSI device probing run device discovery in it's own async
      domain.
      
      However, as a result, the partition detection was no longer synchronized
      by async_synchronize_full() (which, despite the name, only synchronizes
      the global async space, not all of them).  Which in turn meant that
      "wait_for_device_probe()" would not wait for the SCSI partitions to be
      parsed.
      
      And "wait_for_device_probe()" was what the boot time init code relied on
      for mounting the root filesystem.
      
      Now, most people never noticed this, because not only is it
      timing-dependent, but modern distributions all use initrd.  So the root
      filesystem isn't actually on a disk at all.  And then before they
      actually mount the final disk filesystem, they will have loaded the
      scsi-wait-scan module, which not only does the expected
      wait_for_device_probe(), but also does scsi_complete_async_scans().
      
      [ Side note: scsi_complete_async_scans() had also been partially broken,
        but that was fixed in commit 43a8d39d ("fix async probe
        regression"), so that same commit a7a20d10
      
       had actually broken
        setups even if you used scsi-wait-scan explicitly ]
      
      Solve this problem by just moving the scsi_complete_async_scans() call
      into wait_for_device_probe().  Everybody who wants to wait for device
      probing to finish really wants the SCSI probing to complete, so there's
      no reason not to do this.
      
      So now "wait_for_device_probe()" really does what the name implies, and
      properly waits for device probing to finish.  This also removes the now
      unnecessary extra calls to scsi_complete_async_scans().
      Reported-and-tested-by: default avatarArtem S. Tashkinov <t.artem@mailcity.com>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: James Bottomley <jbottomley@parallels.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: linux-scsi <linux-scsi@vger.kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eea03c20
  4. 18 Jul, 2012 9 commits
  5. 17 Jul, 2012 4 commits
  6. 16 Jul, 2012 4 commits
  7. 14 Jul, 2012 7 commits
  8. 13 Jul, 2012 1 commit
  9. 12 Jul, 2012 2 commits