1. 21 Feb, 2017 2 commits
  2. 02 Feb, 2017 3 commits
    • Dan Williams's avatar
      scsi, block: fix duplicate bdi name registration crashes · 0dba1314
      Dan Williams authored
      Warnings of the following form occur because scsi reuses a devt number
      while the block layer still has it referenced as the name of the bdi
      [1]:
      
       WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80
       sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192'
       [..]
       Call Trace:
        dump_stack+0x86/0xc3
        __warn+0xcb/0xf0
        warn_slowpath_fmt+0x5f/0x80
        ? kernfs_path_from_node+0x4f/0x60
        sysfs_warn_dup+0x62/0x80
        sysfs_create_dir_ns+0x77/0x90
        kobject_add_internal+0xb2/0x350
        kobject_add+0x75/0xd0
        device_add+0x15a/0x650
        device_create_groups_vargs+0xe0/0xf0
        device_create_vargs+0x1c/0x20
        bdi_register+0x90/0x240
        ? lockdep_init_map+0x57/0x200
        bdi_register_owner+0x36/0x60
        device_add_disk+0x1bb/0x4e0
        ? __pm_runtime_use_autosuspend+0x5c/0x70
        sd_probe_async+0x10d/0x1c0
        async_run_entry_fn+0x39/0x170
      
      This is a brute-force fix to pass the devt release information from
      sd_probe() to the locations where we register the bdi,
      device_add_disk(), and unregister the bdi, blk_cleanup_queue().
      
      Thanks to Omar for the quick reproducer script [2]. This patch survives
      where an unmodified kernel fails in a few seconds.
      
      [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4
      [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2
      
      
      
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Bart Van Assche <bart.vanassche@sandisk.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Reported-by: default avatarOmar Sandoval <osandov@osandov.com>
      Tested-by: default avatarOmar Sandoval <osandov@fb.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      0dba1314
    • Jan Kara's avatar
      block: Use pointer to backing_dev_info from request_queue · dc3b17cc
      Jan Kara authored
      
      
      We will want to have struct backing_dev_info allocated separately from
      struct request_queue. As the first step add pointer to backing_dev_info
      to request_queue and convert all users touching it. No functional
      changes in this patch.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      dc3b17cc
    • Jan Kara's avatar
      block: Unhash block device inodes on gendisk destruction · f44f1ab5
      Jan Kara authored
      
      
      Currently, block device inodes stay around after corresponding gendisk
      hash died until memory reclaim finds them and frees them. Since we will
      make block device inode pin the bdi, we want to free the block device
      inode as soon as the device goes away so that bdi does not stay around
      unnecessarily. Furthermore we need to avoid issues when new device with
      the same major,minor pair gets created since reusing the bdi structure
      would be rather difficult in this case.
      
      Unhashing block device inode on gendisk destruction nicely deals with
      these problems. Once last block device inode reference is dropped (which
      may be directly in del_gendisk()), the inode gets evicted. Furthermore if
      the major,minor pair gets reallocated, we are guaranteed to get new
      block device inode even if old block device inode is not yet evicted and
      thus we avoid issues with possible reuse of bdi.
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f44f1ab5
  3. 04 Aug, 2016 2 commits
    • Dan Williams's avatar
      block: fix bdi vs gendisk lifetime mismatch · df08c32c
      Dan Williams authored
      
      
      The name for a bdi of a gendisk is derived from the gendisk's devt.
      However, since the gendisk is destroyed before the bdi it leaves a
      window where a new gendisk could dynamically reuse the same devt while a
      bdi with the same name is still live.  Arrange for the bdi to hold a
      reference against its "owner" disk device while it is registered.
      Otherwise we can hit sysfs duplicate name collisions like the following:
      
       WARNING: CPU: 10 PID: 2078 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x80
       sysfs: cannot create duplicate filename '/devices/virtual/bdi/259:1'
      
       Hardware name: HP ProLiant DL580 Gen8, BIOS P79 05/06/2015
        0000000000000286 0000000002c04ad5 ffff88006f24f970 ffffffff8134caec
        ffff88006f24f9c0 0000000000000000 ffff88006f24f9b0 ffffffff8108c351
        0000001f0000000c ffff88105d236000 ffff88105d1031e0 ffff8800357427f8
       Call Trace:
        [<ffffffff8134caec>] dump_stack+0x63/0x87
        [<ffffffff8108c351>] __warn+0xd1/0xf0
        [<ffffffff8108c3cf>] warn_slowpath_fmt+0x5f/0x80
        [<ffffffff812a0d34>] sysfs_warn_dup+0x64/0x80
        [<ffffffff812a0e1e>] sysfs_create_dir_ns+0x7e/0x90
        [<ffffffff8134faaa>] kobject_add_internal+0xaa/0x320
        [<ffffffff81358d4e>] ? vsnprintf+0x34e/0x4d0
        [<ffffffff8134ff55>] kobject_add+0x75/0xd0
        [<ffffffff816e66b2>] ? mutex_lock+0x12/0x2f
        [<ffffffff8148b0a5>] device_add+0x125/0x610
        [<ffffffff8148b788>] device_create_groups_vargs+0xd8/0x100
        [<ffffffff8148b7cc>] device_create_vargs+0x1c/0x20
        [<ffffffff811b775c>] bdi_register+0x8c/0x180
        [<ffffffff811b7877>] bdi_register_dev+0x27/0x30
        [<ffffffff813317f5>] add_disk+0x175/0x4a0
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatarYi Zhang <yizhan@redhat.com>
      Tested-by: default avatarYi Zhang <yizhan@redhat.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      
      Fixed up missing 0 return in bdi_register_owner().
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      df08c32c
    • Vegard Nossum's avatar
      block: fix use-after-free in seq file · 77da1605
      Vegard Nossum authored
      
      
      I got a KASAN report of use-after-free:
      
          ==================================================================
          BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
          Read of size 8 by task trinity-c1/315
          =============================================================================
          BUG kmalloc-32 (Not tainted): kasan: bad access detected
          -----------------------------------------------------------------------------
      
          Disabling lock debugging due to kernel taint
          INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
                  ___slab_alloc+0x4f1/0x520
                  __slab_alloc.isra.58+0x56/0x80
                  kmem_cache_alloc_trace+0x260/0x2a0
                  disk_seqf_start+0x66/0x110
                  traverse+0x176/0x860
                  seq_read+0x7e3/0x11a0
                  proc_reg_read+0xbc/0x180
                  do_loop_readv_writev+0x134/0x210
                  do_readv_writev+0x565/0x660
                  vfs_readv+0x67/0xa0
                  do_preadv+0x126/0x170
                  SyS_preadv+0xc/0x10
                  do_syscall_64+0x1a1/0x460
                  return_from_SYSCALL_64+0x0/0x6a
          INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
                  __slab_free+0x17a/0x2c0
                  kfree+0x20a/0x220
                  disk_seqf_stop+0x42/0x50
                  traverse+0x3b5/0x860
                  seq_read+0x7e3/0x11a0
                  proc_reg_read+0xbc/0x180
                  do_loop_readv_writev+0x134/0x210
                  do_readv_writev+0x565/0x660
                  vfs_readv+0x67/0xa0
                  do_preadv+0x126/0x170
                  SyS_preadv+0xc/0x10
                  do_syscall_64+0x1a1/0x460
                  return_from_SYSCALL_64+0x0/0x6a
      
          CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G    B           4.7.0+ #62
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
           ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
           ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
           ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
          Call Trace:
           [<ffffffff81d6ce81>] dump_stack+0x65/0x84
           [<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
           [<ffffffff814704ff>] object_err+0x2f/0x40
           [<ffffffff814754d1>] kasan_report_error+0x221/0x520
           [<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
           [<ffffffff83888161>] klist_iter_exit+0x61/0x70
           [<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
           [<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
           [<ffffffff8151f812>] seq_read+0x4b2/0x11a0
           [<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
           [<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
           [<ffffffff814b4c45>] do_readv_writev+0x565/0x660
           [<ffffffff814b8a17>] vfs_readv+0x67/0xa0
           [<ffffffff814b8de6>] do_preadv+0x126/0x170
           [<ffffffff814b92ec>] SyS_preadv+0xc/0x10
      
      This problem can occur in the following situation:
      
      open()
       - pread()
          - .seq_start()
             - iter = kmalloc() // succeeds
             - seqf->private = iter
          - .seq_stop()
             - kfree(seqf->private)
       - pread()
          - .seq_start()
             - iter = kmalloc() // fails
          - .seq_stop()
             - class_dev_iter_exit(seqf->private) // boom! old pointer
      
      As the comment in disk_seqf_stop() says, stop is called even if start
      failed, so we need to reinitialise the private pointer to NULL when seq
      iteration stops.
      
      An alternative would be to set the private pointer to NULL when the
      kmalloc() in disk_seqf_start() fails.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      77da1605
  4. 07 Jul, 2016 1 commit
    • Thomas Gleixner's avatar
      timers: Remove set_timer_slack() leftovers · 53bf837b
      Thomas Gleixner authored
      
      
      We now have implicit batching in the timer wheel. The slack API is no longer
      used, so remove it.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrew F. Davis <afd@ti.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jaehoon Chung <jh80.chung@samsung.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Nyman <mathias.nyman@intel.com>
      Cc: Pali Rohár <pali.rohar@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sebastian Reichel <sre@kernel.org>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: linux-block@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mmc@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: linux-usb@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      53bf837b
  5. 27 Jun, 2016 1 commit
  6. 16 Jun, 2016 1 commit
  7. 10 Jan, 2016 1 commit
  8. 09 Jan, 2016 4 commits
    • Dan Williams's avatar
      block: clarify badblocks lifetime · 20a308f0
      Dan Williams authored
      
      
      The badblocks list attached to a gendisk is allocated by the driver
      which equates to the driver owning the lifetime of the object.  Do not
      automatically free it in del_gendisk(). This is in preparation for
      expanding the use of badblocks in libnvdimm drivers and introducing
      devm_init_badblocks().
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      20a308f0
    • Dan Williams's avatar
      badblocks: rename badblocks_free to badblocks_exit · d3b407fb
      Dan Williams authored
      
      
      For symmetry with badblocks_init() make it clear that this path only
      destroys incremental allocations of a badblocks instance, and does not
      free the badblocks instance itself.
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      d3b407fb
    • Vishal Verma's avatar
      block: Add badblock management for gendisks · 99e6608c
      Vishal Verma authored
      
      
      NVDIMM devices, which can behave more like DRAM rather than block
      devices, may develop bad cache lines, or 'poison'. A block device
      exposed by the pmem driver can then consume poison via a read (or
      write), and cause a machine check. On platforms without machine
      check recovery features, this would mean a crash.
      
      The block device maintaining a runtime list of all known sectors that
      have poison can directly avoid this, and also provide a path forward
      to enable proper handling/recovery for DAX faults on such a device.
      
      Use the new badblock management interfaces to add a badblocks list to
      gendisks.
      Signed-off-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      99e6608c
    • Dan Williams's avatar
      block: fix del_gendisk() vs blkdev_ioctl crash · ac34f15e
      Dan Williams authored
      
      
      When tearing down a block device early in its lifetime, userspace may
      still be performing discovery actions like blkdev_ioctl() to re-read
      partitions.
      
      The nvdimm_revalidate_disk() implementation depends on
      disk->driverfs_dev to be valid at entry.  However, it is set to NULL in
      del_gendisk() and fatally this is happening *before* the disk device is
      deleted from userspace view.
      
      There's no reason for del_gendisk() to clear ->driverfs_dev.  That
      device is the parent of the disk.  It is guaranteed to not be freed
      until the disk, as a child, drops its ->parent reference.
      
      We could also fix this issue locally in nvdimm_revalidate_disk() by
      using disk_to_dev(disk)->parent, but lets fix it globally since
      ->driverfs_dev follows the lifetime of the parent.  Longer term we
      should probably just add a @parent parameter to add_disk(), and stop
      carrying this pointer in the gendisk.
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffffa00340a8>] nvdimm_revalidate_disk+0x18/0x90 [libnvdimm]
       CPU: 2 PID: 538 Comm: systemd-udevd Tainted: G           O    4.4.0-rc5 #2257
       [..]
       Call Trace:
        [<ffffffff8143e5c7>] rescan_partitions+0x87/0x2c0
        [<ffffffff810f37f9>] ? __lock_is_held+0x49/0x70
        [<ffffffff81438c62>] __blkdev_reread_part+0x72/0xb0
        [<ffffffff81438cc5>] blkdev_reread_part+0x25/0x40
        [<ffffffff8143982d>] blkdev_ioctl+0x4fd/0x9c0
        [<ffffffff811246c9>] ? current_kernel_time64+0x69/0xd0
        [<ffffffff812916dd>] block_ioctl+0x3d/0x50
        [<ffffffff81264c38>] do_vfs_ioctl+0x308/0x560
        [<ffffffff8115dbd1>] ? __audit_syscall_entry+0xb1/0x100
        [<ffffffff810031d6>] ? do_audit_syscall_entry+0x66/0x70
        [<ffffffff81264f09>] SyS_ioctl+0x79/0x90
        [<ffffffff81902672>] entry_SYSCALL_64_fastpath+0x12/0x76
      Reported-by: default avatarRobert Hu <robert.hu@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      ac34f15e
  9. 24 Nov, 2015 1 commit
  10. 21 Oct, 2015 1 commit
    • Martin K. Petersen's avatar
      block: Inline blk_integrity in struct gendisk · 25520d55
      Martin K. Petersen authored
      
      
      Up until now the_integrity profile has been dynamically allocated and
      attached to struct gendisk after the disk has been made active.
      
      This causes problems because NVMe devices need to register the profile
      prior to the partition table being read due to a mandatory metadata
      buffer requirement. In addition, DM goes through hoops to deal with
      preallocating, but not initializing integrity profiles.
      
      Since the integrity profile is small (4 bytes + a pointer), Christoph
      suggested moving it to struct gendisk proper. This requires several
      changes:
      
       - Moving the blk_integrity definition to genhd.h.
      
       - Inlining blk_integrity in struct gendisk.
      
       - Removing the dynamic allocation code.
      
       - Adding helper functions which allow gendisk to set up and tear down
         the integrity sysfs dir when a disk is added/deleted.
      
       - Adding a blk_integrity_revalidate() callback for updating the stable
         pages bdi setting.
      
       - The calls that depend on whether a device has an integrity profile or
         not now key off of the bi->profile pointer.
      
       - Simplifying the integrity support routines in DM (Mike Snitzer).
      Signed-off-by: default avatarMartin K. Petersen <martin.petersen@oracle.com>
      Reported-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      25520d55
  11. 17 Jul, 2015 2 commits
  12. 11 Jun, 2015 1 commit
    • Dan Williams's avatar
      block: fix ext_dev_lock lockdep report · 4d66e5e9
      Dan Williams authored
       =================================
       [ INFO: inconsistent lock state ]
       4.1.0-rc7+ #217 Tainted: G           O
       ---------------------------------
       inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
       swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
        (ext_devt_lock){+.?...}, at: [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70
       {SOFTIRQ-ON-W} state was registered at:
         [<ffffffff810bf6b1>] __lock_acquire+0x461/0x1e70
         [<ffffffff810c1947>] lock_acquire+0xb7/0x290
         [<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
         [<ffffffff8143a07d>] blk_alloc_devt+0x6d/0xd0  <-- take the lock in process context
      [..]
        [<ffffffff810bf64e>] __lock_acquire+0x3fe/0x1e70
        [<ffffffff810c00ad>] ? __lock_acquire+0xe5d/0x1e70
        [<ffffffff810c1947>] lock_acquire+0xb7/0x290
        [<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
        [<ffffffff818ac3a8>] _raw_spin_lock+0x38/0x50
        [<ffffffff8143a60c>] ? blk_free_devt+0x3c/0x70
        [<ffffffff8143a60c>] blk_free_devt+0x3c/0x70    <-- take the lock in softirq
        [<ffffffff8143bfec>] part_release+0x1c/0x50
        [<ffffffff8158edf6>] device_release+0x36/0xb0
        [<ffffffff8145ac2b>] kobject_cleanup+0x7b/0x1a0
        [<ffffffff8145aad0>] kobject_put+0x30/0x70
        [<ffffffff8158f147>] put_device+0x17/0x20
        [<ffffffff8143c29c>] delete_partition_rcu_cb+0x16c/0x180
        [<ffffffff8143c130>] ? read_dev_sector+0xa0/0xa0
        [<ffffffff810e0e0f>] rcu_process_callbacks+0x2ff/0xa90
        [<ffffffff810e0dcf>] ? rcu_process_callbacks+0x2bf/0xa90
        [<ffffffff81067e2e>] __do_softirq+0xde/0x600
      
      Neil sees this in his tests and it also triggers on pmem driver unbind
      for the libnvdimm tests.  This fix is on top of an initial fix by Keith
      for incorrect usage of mutex_lock() in this path: 2da78092 "block:
      Fix dev_t minor allocation lifetime".  Both this and 2da78092 are
      candidates for -stable.
      
      Fixes: 2da78092
      
       ("block: Fix dev_t minor allocation lifetime")
      Cc: <stable@vger.kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Reported-by: default avatarNeilBrown <neilb@suse.de>
      Signed-off-by: default avatarDan Williams <dan.j.williams@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4d66e5e9
  13. 02 Jun, 2015 1 commit
    • Tejun Heo's avatar
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo authored
      
      
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      66114cad
  14. 28 May, 2015 1 commit
  15. 19 Nov, 2014 1 commit
  16. 22 Sep, 2014 1 commit
  17. 09 Sep, 2014 1 commit
  18. 03 Sep, 2014 1 commit
  19. 11 Sep, 2013 1 commit
  20. 03 Jul, 2013 1 commit
  21. 14 May, 2013 1 commit
    • Viresh Kumar's avatar
      block: queue work on power efficient wq · 695588f9
      Viresh Kumar authored
      
      
      Block layer uses workqueues for multiple purposes. There is no real dependency
      of scheduling these on the cpu which scheduled them.
      
      On a idle system, it is observed that and idle cpu wakes up many times just to
      service this work. It would be better if we can schedule it on a cpu which the
      scheduler believes to be the most appropriate one.
      
      This patch replaces normal workqueues with power efficient versions.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      695588f9
  22. 11 Apr, 2013 1 commit
  23. 08 Apr, 2013 1 commit
    • Kay Sievers's avatar
      driver core: add uid and gid to devtmpfs · 3c2670e6
      Kay Sievers authored
      
      
      Some drivers want to tell userspace what uid and gid should be used for
      their device nodes, so allow that information to percolate through the
      driver core to userspace in order to make this happen.  This means that
      some systems (i.e.  Android and friends) will not need to even run a
      udev-like daemon for their device node manager and can just rely in
      devtmpfs fully, reducing their footprint even more.
      Signed-off-by: default avatarKay Sievers <kay@vrfy.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c2670e6
  24. 28 Feb, 2013 3 commits
    • Tejun Heo's avatar
      block: convert to idr_alloc() · bab998d6
      Tejun Heo authored
      
      
      Convert to the much saner new idr interface.  Both bsg and genhd
      protect idr w/ mutex making preloading unnecessary.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bab998d6
    • Tejun Heo's avatar
      block: fix synchronization and limit check in blk_alloc_devt() · ce23bba8
      Tejun Heo authored
      
      
      idr allocation in blk_alloc_devt() wasn't synchronized against lookup
      and removal, and its limit check was off by one - 1 << MINORBITS is
      the number of minors allowed, not the maximum allowed minor.
      
      Add locking and rename MAX_EXT_DEVT to NR_EXT_DEVT and fix limit
      checking.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ce23bba8
    • Tomas Henzl's avatar
      block: fix ext_devt_idr handling · 7b74e912
      Tomas Henzl authored
      
      
      While adding and removing a lot of disks disks and partitions this
      sometimes shows up:
      
        WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
        Hardware name:
        sysfs: cannot create duplicate filename '/dev/block/259:751'
        Modules linked in: raid1 autofs4 bnx2fc cnic uio fcoe libfcoe libfc 8021q scsi_transport_fc scsi_tgt garp stp llc sunrpc cpufreq_ondemand powernow_k8 freq_table mperf ipv6 dm_mirror dm_region_hash dm_log power_meter microcode dcdbas serio_raw amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core k10temp bnx2 sg ixgbe dca mdio ext4 mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif ata_generic pata_acpi pata_atiixp ahci mptsas mptscsih mptbase scsi_transport_sas dm_multipath dm_mod [last unloaded: scsi_wait_scan]
        Pid: 44103, comm: async/16 Not tainted 2.6.32-195.el6.x86_64 #1
        Call Trace:
          warn_slowpath_common+0x87/0xc0
          warn_slowpath_fmt+0x46/0x50
          sysfs_add_one+0xc9/0x130
          sysfs_do_create_link+0x12b/0x170
          sysfs_create_link+0x13/0x20
          device_add+0x317/0x650
          idr_get_new+0x13/0x50
          add_partition+0x21c/0x390
          rescan_partitions+0x32b/0x470
          sd_open+0x81/0x1f0 [sd_mod]
          __blkdev_get+0x1b6/0x3c0
          blkdev_get+0x10/0x20
          register_disk+0x155/0x170
          add_disk+0xa6/0x160
          sd_probe_async+0x13b/0x210 [sd_mod]
          add_wait_queue+0x46/0x60
          async_thread+0x102/0x250
          default_wake_function+0x0/0x20
          async_thread+0x0/0x250
          kthread+0x96/0xa0
          child_rip+0xa/0x20
          kthread+0x0/0xa0
          child_rip+0x0/0x20
      
      This most likely happens because dev_t is freed while the number is
      still used and idr_get_new() is not protected on every use.  The fix
      adds a mutex where it wasn't before and moves the dev_t free function so
      it is called after device del.
      Signed-off-by: default avatarTomas Henzl <thenzl@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b74e912
  25. 24 Feb, 2013 1 commit
    • Ming Lei's avatar
      block/genhd.c: apply pm_runtime_set_memalloc_noio on block devices · 25e823c8
      Ming Lei authored
      
      
      Apply the introduced pm_runtime_set_memalloc_noio on block device so
      that PM core will teach mm to not allocate memory with GFP_IOFS when
      calling the runtime_resume and runtime_suspend callback for block
      devices and its ancestors.
      Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Oliver Neukum <oneukum@suse.de>
      Cc: Jiri Kosina <jiri.kosina@suse.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Greg KH <greg@kroah.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Decotigny <david.decotigny@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      25e823c8
  26. 19 Dec, 2012 2 commits
    • Derek Basehore's avatar
      block: prevent race/cleanup · 12c2bdb2
      Derek Basehore authored
      
      
      Remove a race condition which causes a warning in disk_clear_events.  This
      is a race between disk_clear_events() and disk_flush_events().
      ev->clearing will be altered by disk_flush_events() even though we are
      blocking event checking through disk_flush_events().  If this happens
      after ev->clearing was cleared for disk_clear_events(), this can cause the
      WARN_ON_ONCE() in that function to be triggered.
      
      This change also has disk_clear_events() not go through a workqueue.
      Since we have to wait for the work to complete, we should just call the
      function directly.  Also, since this work cannot be put on a freezable
      workqueue, it will have to contend with increased demand, so calling the
      function directly avoids this.
      
      [akpm@linux-foundation.org: fix spello in comment]
      Signed-off-by: default avatarDerek Basehore <dbasehore@chromium.org>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      12c2bdb2
    • Derek Basehore's avatar
      block: remove deadlock in disk_clear_events · aea24a8b
      Derek Basehore authored
      
      
      In disk_clear_events, do not put work on system_nrt_freezable_wq.
      Instead, put it on system_nrt_wq.
      
      There is a race between probing a usb and suspending the device.  Since
      probing a usb calls disk_clear_events, which puts work on a frozen
      workqueue, probing cannot finish after the workqueue is frozen.  However,
      suspending cannot finish until the usb probe is finished, so we get a
      deadlock, causing the system to reboot.
      
      The way to reproduce this bug is to wake up from suspend with a usb
      storage device plugged in, or plugging in a usb storage device right
      before suspend.  The window of time is on the order of time it takes to
      probe the usb device.  As long as the workqueues are frozen before the
      call to add_disk within sd_probe_async finishes, there will be a deadlock
      (which calls blkdev_get, sd_open, check_disk_change, then
      disk_clear_events).  This is not difficult to reproduce after figuring out
      the timings.
      
      [akpm@linux-foundation.org: fix up comment]
      Signed-off-by: default avatarDerek Basehore <dbasehore@chromium.org>
      Reviewed-by: default avatarMandeep Singh Baines <msb@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      aea24a8b
  27. 23 Nov, 2012 1 commit
    • Stephen Warren's avatar
      block: store partition_meta_info.uuid as a string · 1ad7e899
      Stephen Warren authored
      
      
      This will allow other types of UUID to be stored here, aside from true
      UUIDs.  This also simplifies code that uses this field, since it's usually
      constructed from a, used as a, or compared to other, strings.
      
      Note: A simplistic approach here would be to set uuid_str[36]=0 whenever a
      /PARTNROFF option was found to be present.  However, this modifies the
      input string, and causes subsequent calls to devt_from_partuuid() not to
      see the /PARTNROFF option, which causes different results.  In order to
      avoid misleading future maintainers, this parameter is marked const.
      Signed-off-by: Stephen Warren's avatarStephen Warren <swarren@nvidia.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      1ad7e899
  28. 10 Nov, 2012 1 commit
  29. 20 Aug, 2012 1 commit
    • Tejun Heo's avatar
      workqueue: deprecate system_nrt[_freezable]_wq · 3b07e9ca
      Tejun Heo authored
      
      
      system_nrt[_freezable]_wq are now spurious.  Mark them deprecated and
      convert all users to system[_freezable]_wq.
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
      Please use system[_freezable]_wq instead.
      
      This patch doesn't make any functional difference.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-By: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      3b07e9ca