1. 06 Mar, 2012 10 commits
    • Tejun Heo's avatar
      blkcg: add blkcg_{init|drain|exit}_queue() · 5efd6113
      Tejun Heo authored
      Currently block core calls directly into blk-throttle for init, drain
      and exit.  This patch adds blkcg_{init|drain|exit}_queue() which wraps
      the blk-throttle functions.  This is to give more control and
      visiblity to blkcg core layer for proper layering.  Further patches
      will add logic common to blkcg policies to the functions.
      While at it, collapse blk_throtl_release() into blk_throtl_exit().
      There's no reason to keep them separate.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: let blkio_group point to blkio_cgroup directly · 7ee9c562
      Tejun Heo authored
      Currently, blkg points to the associated blkcg via its css_id.  This
      unnecessarily complicates dereferencing blkcg.  Let blkg hold a
      reference to the associated blkcg and point directly to it and disable
      css_id on blkio_subsys.
      This change requires splitting blkiocg_destroy() into
      blkiocg_pre_destroy() and blkiocg_destroy() so that all blkg's can be
      destroyed and all the blkcg references held by them dropped during
      cgroup removal.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: kill the mind-bending blkg->dev · 7a4dd281
      Tejun Heo authored
      blkg->dev is dev_t recording the device number of the block device for
      the associated request_queue.  It is used to identify the associated
      block device when printing out configuration or stats.
      This is redundant to begin with.  A blkg is an association between a
      cgroup and a request_queue and it of course is possible to reach
      request_queue from blkg and synchronization conventions are in place
      for safe q dereferencing, so this shouldn't be necessary from the
      beginning.  Furthermore, it's initialized by sscanf()ing the device
      name of backing_dev_info.  The mind boggles.
      Anyways, if blkg is visible under rcu lock, we *know* that the
      associated request_queue hasn't gone away yet and its bdi is
      registered and alive - blkg can't be created for request_queue which
      hasn't been fully initialized and it can't go away before blkg is
      Let stat and conf read functions get device name from
      blkg->q->backing_dev_info.dev and pass it down to printing functions
      and remove blkg->dev.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: kill blkio_policy_node · 4bfd482e
      Tejun Heo authored
      Now that blkcg configuration lives in blkg's, blkio_policy_node is no
      longer necessary.  Kill it.
      blkio_policy_parse_and_set() now fails if invoked for missing device
      and functions to print out configurations are updated to print from
      cftype_blkg_same_policy() is dropped along with other policy functions
      for consistency.  Its one line is open coded in the only user -
      -v2: Update to reflect the retry-on-bypass logic change of the
           previous patch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: don't allow or retain configuration of missing devices · e56da7e2
      Tejun Heo authored
      blkcg is very peculiar in that it allows setting and remembering
      configurations for non-existent devices by maintaining separate data
      structures for configuration.
      This behavior is completely out of the usual norms and outright
      confusing; furthermore, it uses dev_t number to match the
      configuration to devices, which is unpredictable to begin with and
      becomes completely unuseable if EXT_DEVT is fully used.
      It is wholely unnecessary - we already have fully functional userland
      mechanism to program devices being hotplugged which has full access to
      device identification, connection topology and filesystem information.
      Add a new struct blkio_group_conf which contains all blkcg
      configurations to blkio_group and let blkio_group, which can be
      created iff the associated device exists and is removed when the
      associated device goes away, carry all configurations.
      Note that, after this patch, all newly created blkg's will always have
      the default configuration (unlimited for throttling and blkcg's weight
      for propio).
      This patch makes blkio_policy_node meaningless but doesn't remove it.
      The next patch will.
      -v2: Updated to retry after short sleep if blkg lookup/creation failed
           due to the queue being temporarily bypassed as indicated by
           -EBUSY return.  Pointed out by Vivek.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: factor out blkio_group creation · cd1604fa
      Tejun Heo authored
      Currently both blk-throttle and cfq-iosched implement their own
      blkio_group creation code in throtl_get_tg() and cfq_get_cfqg().  This
      patch factors out the common code into blkg_lookup_create(), which
      returns ERR_PTR value so that transitional failures due to queue
      bypass can be distinguished from other failures.
      * New plkio_policy_ops methods blkio_alloc_group_fn() and
        blkio_link_group_fn added.  Both are transitional and will be
        removed once the blkg management code is fully moved into
      * blkio_alloc_group_fn() allocates policy-specific blkg which is
        usually a larger data structure with blkg as the first entry and
        intiailizes it.  Note that initialization of blkg proper, including
        percpu stats, is responsibility of blk-cgroup proper.
        Note that default config (weight, bps...) initialization is done
        from this method; otherwise, we end up violating locking order
        between blkcg and q locks via blkcg_get_CONF() functions.
      * blkio_link_group_fn() is called under queue_lock and responsible for
        linking the blkg to the queue.  blkcg side is handled by blk-cgroup
      * The common blkg creation function is named blkg_lookup_create() and
        blkiocg_lookup_group() is renamed to blkg_lookup() for consistency.
        Also, throtl / cfq related functions are similarly [re]named for
      This simplifies blkcg policy implementations and enables further
      -v2: Vivek noticed that blkg_lookup_create() incorrectly tested
           blk_queue_dead() instead of blk_queue_bypass() leading a user of
           the function ending up creating a new blkg on bypassing queue.
           This is a bug introduced while relocating bypass patches before
           this one.  Fixed.
      -v3: ERR_PTR patch folded into this one.  @for_root added to
           blkg_lookup_create() to allow creating root group on a bypassed
           queue during elevator switch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: add blkio_policy[] array and allow one policy per policy ID · 035d10b2
      Tejun Heo authored
      Block cgroup policies are maintained in a linked list and,
      theoretically, multiple policies sharing the same policy ID are
      This patch temporarily restricts one policy per plid and adds
      blkio_policy[] array which indexes registered policy types by plid.
      Both the restriction and blkio_policy[] array are transitional and
      will be removed once API cleanup is complete.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: use q and plid instead of opaque void * for blkio_group association · ca32aefc
      Tejun Heo authored
      blkgio_group is association between a block cgroup and a queue for a
      given policy.  Using opaque void * for association makes things
      confusing and hinders factoring of common code.  Use request_queue *
      and, if necessary, policy id instead.
      This will help block cgroup API cleanup.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: shoot down blkio_groups on elevator switch · 72e06c25
      Tejun Heo authored
      Elevator switch may involve changes to blkcg policies.  Implement
      shoot down of blkio_groups.
      Combined with the previous bypass updates, the end goal is updating
      blkcg core such that it can ensure that blkcg's being affected become
      quiescent and don't have any per-blkg data hanging around before
      commencing any policy updates.  Until queues are made aware of the
      policies that applies to them, as an interim step, all per-policy blkg
      data will be shot down.
      * blk-throtl doesn't need this change as it can't be disabled for a
        live queue; however, update it anyway as the scheduled blkg
        unification requires this behavior change.  This means that
        blk-throtl configuration will be unnecessarily lost over elevator
        switch.  This oddity will be removed after blkcg learns to associate
        individual policies with request_queues.
      * blk-throtl dosen't shoot down root_tg.  This is to ease transition.
        Unified blkg will always have persistent root group and not shooting
        down root_tg for now eases transition to that point by avoiding
        having to update td->root_tg and is safe as blk-throtl can never be
      -v2: Vivek pointed out that group list is not guaranteed to be empty
           on return from clear function if it raced cgroup removal and
           lost.  Fix it by waiting a bit and retrying.  This kludge will
           soon be removed once locking is updated such that blkg is never
           in limbo state between blkcg and request_queue locks.
           blk-throtl no longer shoots down root_tg to avoid breaking
           Also, Nest queue_lock inside blkio_list_lock not the other way
           around to avoid introduce possible deadlock via blkcg lock.
      -v3: blkcg_clear_queue() repositioned and renamed to
           blkg_destroy_all() to increase consistency with later changes.
           cfq_clear_queue() updated to check q->elevator before
           dereferencing it to avoid NULL dereference on not fully
           initialized queues (used by later change).
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
    • Tejun Heo's avatar
      blkcg: make CONFIG_BLK_CGROUP bool · 32e380ae
      Tejun Heo authored
      Block cgroup core can be built as module; however, it isn't too useful
      as blk-throttle can only be built-in and cfq-iosched is usually the
      default built-in scheduler.  Scheduled blkcg cleanup requires calling
      into blkcg from block core.  To simplify that, disallow building blkcg
      as module by making CONFIG_BLK_CGROUP bool.
      If building blkcg core as module really matters, which I doubt, we can
      revisit it after blkcg API cleanup.
      -v2: Vivek pointed out that IOSCHED_CFQ was incorrectly updated to
           depend on BLK_CGROUP.  Fixed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
  2. 24 Oct, 2011 1 commit
  3. 23 May, 2011 1 commit
  4. 20 May, 2011 3 commits
  5. 16 May, 2011 1 commit
    • Vivek Goyal's avatar
      blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup · 70087dc3
      Vivek Goyal authored
      Currentlly we first map the task to cgroup and then cgroup to
      blkio_cgroup. There is a more direct way to get to blkio_cgroup
      from task using task_subsys_state(). Use that.
      The real reason for the fix is that it also avoids a race in generic
      cgroup code. During remount/umount rebind_subsystems() is called and
      it can do following with and rcu protection.
      cgrp->subsys[i] = NULL;
      That means if somebody got hold of cgroup under rcu and then it tried
      to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
      is wrong. I was running into this race condition with ltp running on a
      upstream derived kernel and that lead to crash.
      So ideally we should also fix cgroup generic code to wait for rcu
      grace period before setting pointer to NULL. Li Zefan is not very keen
      on introducing synchronize_wait() as he thinks it will slow
      down moun/remount/umount operations.
      So for the time being atleast fix the kernel crash by taking a more
      direct route to blkio_cgroup.
      One tester had reported a crash while running LTP on a derived kernel
      and with this fix crash is no more seen while the test has been
      running for over 6 days.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
  6. 12 Mar, 2011 1 commit
  7. 08 Mar, 2011 1 commit
  8. 01 Oct, 2010 2 commits
    • Vivek Goyal's avatar
      blkio-throttle: limit max iops value to UINT_MAX · 9355aede
      Vivek Goyal authored
      - Limit max iops value to UINT_MAX and return error to user if value is more
        than that instead of accepting bigger values and truncating implicitly.
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
    • Vivek Goyal's avatar
      blkio: Recalculate the throttled bio dispatch time upon throttle limit change · fe071437
      Vivek Goyal authored
      o Currently any cgroup throttle limit changes are processed asynchronousy and
        the change does not take affect till a new bio is dispatched from same group.
      o It might happen that a user sets a redicuously low limit on throttling.
        Say 1 bytes per second on reads. In such cases simple operations like mount
        a disk can wait for a very long time.
      o Once bio is throttled, there is no easy way to come out of that wait even if
        user increases the read limit later.
      o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
        the bio dispatch time according to new limits.
      o Can't take queueu lock under blkcg_lock, hence after the change I wake
        up the dispatch thread again which recalculates the time. So there are some
        variables being synchronized across two threads without lock and I had to
        make use of barriers. Hoping I have used barriers correctly. Any review of
        memory barrier code especially will help.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
  9. 16 Sep, 2010 3 commits
  10. 26 Apr, 2010 2 commits
    • Vivek Goyal's avatar
      blk-cgroup: config options re-arrangement · afc24d49
      Vivek Goyal authored
      This patch fixes few usability and configurability issues.
      o All the cgroup based controller options are configurable from
        "Genral Setup/Control Group Support/" menu. blkio is the only exception.
        Hence make this option visible in above menu and make it configurable from
        there to bring it inline with rest of the cgroup based controllers.
      o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.
        This option currently does two things.
        - Enable printing of cgroup paths in blktrace
        - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
          files in cgroup.
        If we are using group scheduling, blktrace data is of not really much use
        if cgroup information is not present. To get this data, currently one has to
        also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
        all the additional debug stat files which is not desired.
        Hence, this patch moves printing of cgroup paths under
        This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
        the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
        can be enabled through config menu.
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarDivyesh Shah <dpshah@google.com>
      Reviewed-by: default avatarGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Vivek Goyal's avatar
      blkio: Fix another BUG_ON() crash due to cfqq movement across groups · e5ff082e
      Vivek Goyal authored
      o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
        assuming that upon slice expiry, group can't be marked empty already (except
        forced dispatch).
        But this assumption is broken if cfqq can move (group_isolation=0) across
        groups after receiving a request.
        I think most likely in this case we got a request in a cfqq and accounted
        the rq in one group, later while adding the cfqq to tree, we moved the queue
        to a different group which was already marked empty and after dispatch from
        slice we found group already marked empty and raised alarm.
        This patch does not error out if group is already marked empty. This can
        introduce some empty_time stat error only in case of group_isolation=0. This
        is better than crashing. In case of group_isolation=1 we should still get
        same stats as before this patch.
      [  222.308546] ------------[ cut here ]------------
      [  222.309311] kernel BUG at block/blk-cgroup.c:236!
      [  222.309311] invalid opcode: 0000 [#1] SMP
      [  222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
      [  222.309311] CPU 1
      [  222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      [  222.309311]
      [  222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
      [  222.309311] RIP: 0010:[<ffffffff8121ad88>]  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
      [  222.309311] RSP: 0018:ffff8800ba6e79f8  EFLAGS: 00010002
      [  222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
      [  222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
      [  222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
      [  222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
      [  222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
      [  222.309311] FS:  00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
      [  222.309311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
      [  222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
      [  222.309311] Stack:
      [  222.309311]  0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
      [  222.309311] <0> ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
      [  222.309311] <0> ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
      [  222.309311] Call Trace:
      [  222.309311]  [<ffffffff8121da35>] __cfq_slice_expired+0x2af/0x3ec
      [  222.309311]  [<ffffffff8121fd7b>] cfq_dispatch_requests+0x2c8/0x8e8
      [  222.309311]  [<ffffffff8120f1cd>] ? spin_unlock_irqrestore+0xe/0x10
      [  222.309311]  [<ffffffff8120fb1a>] ? blk_insert_cloned_request+0x70/0x7b
      [  222.309311]  [<ffffffff81210461>] blk_peek_request+0x191/0x1a7
      [  222.309311]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
      [  222.309311]  [<ffffffff810ae61f>] ? sync_page_killable+0x0/0x35
      [  222.309311]  [<ffffffff81210fd4>] __generic_unplug_device+0x32/0x37
      [  222.309311]  [<ffffffff81211274>] generic_unplug_device+0x2e/0x3c
      [  222.309311]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
      [  222.309311]  [<ffffffff8120ca37>] blk_unplug+0x29/0x2d
      [  222.309311]  [<ffffffff8120ca4d>] blk_backing_dev_unplug+0x12/0x14
      [  222.309311]  [<ffffffff81109a7a>] block_sync_page+0x35/0x39
      [  222.309311]  [<ffffffff810ae616>] sync_page+0x41/0x4a
      [  222.309311]  [<ffffffff810ae62d>] sync_page_killable+0xe/0x35
      [  222.309311]  [<ffffffff8158aa59>] __wait_on_bit_lock+0x46/0x8f
      [  222.309311]  [<ffffffff810ae4f5>] __lock_page_killable+0x66/0x6d
      [  222.309311]  [<ffffffff81056f9c>] ? wake_bit_function+0x0/0x33
      [  222.309311]  [<ffffffff810ae528>] lock_page_killable+0x2c/0x2e
      [  222.309311]  [<ffffffff810afbc5>] generic_file_aio_read+0x361/0x4f0
      [  222.309311]  [<ffffffff810ea044>] do_sync_read+0xcb/0x108
      [  222.309311]  [<ffffffff811e42f7>] ? security_file_permission+0x16/0x18
      [  222.309311]  [<ffffffff810ea6ab>] vfs_read+0xab/0x108
      [  222.309311]  [<ffffffff810ea7c8>] sys_read+0x4a/0x6e
      [  222.309311]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
      [  222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 <0f> 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
      [  222.309311] RIP  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
      [  222.309311]  RSP <ffff8800ba6e79f8>
      [  222.309311] ---[ end trace 32b4f71dffc15712 ]---
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarDivyesh Shah <dpshah@google.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  11. 16 Apr, 2010 1 commit
  12. 13 Apr, 2010 2 commits
    • Divyesh Shah's avatar
      block: Update to io-controller stats · a11cdaa7
      Divyesh Shah authored
      Changelog from v1:
      o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
        dispatch time.
      Changelog from original patchset: (in response to Vivek Goyal's comments)
      o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
      o rename blkiocg_update_set_active_queue_stats() to
      o s/request/io/ in blkiocg_update_request_add_stats() and
      o Call cfq_del_timer() at request dispatch() instead of
      Signed-off-by: default avatarDivyesh <Shah&lt;dpshah@google.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Gui Jianfeng's avatar
      io-controller: Add a new interface "weight_device" for IO-Controller · 34d0f179
      Gui Jianfeng authored
      Currently, IO Controller makes use of blkio.weight to assign weight for
      all devices. Here a new user interface "blkio.weight_device" is introduced to
      assign different weights for different devices. blkio.weight becomes the
      default value for devices which are not configured by "blkio.weight_device"
      You can use the following format to assigned specific weight for a given
      #echo "major:minor weight" > blkio.weight_device
      major:minor represents device number.
      And you can remove weight for a given device as following:
      #echo "major:minor 0" > blkio.weight_device
      V1->V2 changes:
      - use user interface "weight_device" instead of "policy" suggested by Vivek
      - rename some struct suggested by Vivek
      - rebase to 2.6-block "for-linus" branch
      - remove an useless list_empty check pointed out by Li Zefan
      - some trivial typo fix
      V2->V3 changes:
      - Move policy_*_node() functions up to get rid of forward declarations
      - rename related functions by adding prefix "blkio_"
      Signed-off-by: default avatarGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  13. 09 Apr, 2010 4 commits
    • Divyesh Shah's avatar
      blkio: Add more debug-only per-cgroup stats · 812df48d
      Divyesh Shah authored
      1) group_wait_time - This is the amount of time the cgroup had to wait to get a
        timeslice for one of its queues from when it became busy, i.e., went from 0
        to 1 request queued. This is different from the io_wait_time which is the
        cumulative total of the amount of time spent by each IO in that cgroup waiting
        in the scheduler queue. This stat is a great way to find out any jobs in the
        fleet that are being starved or waiting for longer than what is expected (due
        to an IO controller bug or any other issue).
      2) empty_time - This is the amount of time a cgroup spends w/o any pending
         requests. This stat is useful when a job does not seem to be able to use its
         assigned disk share by helping check if that is happening due to an IO
         controller bug or because the job is not submitting enough IOs.
      3) idle_time - This is the amount of time spent by the IO scheduler idling
         for a given cgroup in anticipation of a better request than the exising ones
         from other queues/cgroups.
      All these stats are recorded using start and stop events. When reading these
      stats, we do not add the delta between the current time and the last start time
      if we're between the start and stop events. We avoid doing this to make sure
      that these numbers are always monotonically increasing when read. Since we're
      using sched_clock() which may use the tsc as its source, it may induce some
      inconsistency (due to tsc resync across cpus) if we included the current delta.
      Signed-off-by: default avatarDivyesh <Shah&lt;dpshah@google.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Divyesh Shah's avatar
      blkio: Add io_queued and avg_queue_size stats · cdc1184c
      Divyesh Shah authored
      These stats are useful for getting a feel for the queue depth of the cgroup,
      i.e., how filled up its queues are at a given instant and over the existence of
      the cgroup. This ability is useful when debugging problems in the wild as it
      helps understand the application's IO pattern w/o having to read through the
      userspace code (coz its tedious or just not available) or w/o the ability
      to run blktrace (since you may not have root access and/or not want to disturb
      Signed-off-by: default avatarDivyesh <Shah&lt;dpshah@google.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Divyesh Shah's avatar
      blkio: Add io_merged stat · 812d4026
      Divyesh Shah authored
      This includes both the number of bios merged into requests belonging to this
      cgroup as well as the number of requests merged together.
      In the past, we've observed different merging behavior across upstream kernels,
      some by design some actual bugs. This stat helps a lot in debugging such
      problems when applications report decreased throughput with a new kernel
      This needed adding an extra elevator function to capture bios being merged as I
      did not want to pollute elevator code with blkiocg knowledge and hence needed
      the accounting invocation to come from CFQ.
      Signed-off-by: default avatarDivyesh <Shah&lt;dpshah@google.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
    • Divyesh Shah's avatar
      blkio: Changes to IO controller additional stats patches · 84c124da
      Divyesh Shah authored
      that include some minor fixes and addresses all comments.
      Changelog: (most based on Vivek Goyal's comments)
      o renamed blkiocg_reset_write to blkiocg_reset_stats
      o more clarification in the documentation on io_service_time and io_wait_time
      o Initialize blkg->stats_lock
      o rename io_add_stat to blkio_add_stat and declare it static
      o use bool for direction and sync
      o derive direction and sync info from existing rq methods
      o use 12 for major:minor string length
      o define io_service_time better to cover the NCQ case
      o add a separate reset_stats interface
      o make the indexed stats a 2d array to simplify macro and function pointer code
      o blkio.time now exports in jiffies as before
      o Added stats description in patch description and
      o Prefix all stats functions with blkio and make them static as applicable
      o replace IO_TYPE_MAX with IO_TYPE_TOTAL
      o Moved #define constant to top of blk-cgroup.c
      o Pass dev_t around instead of char *
      o Add note to documentation file about resetting stats
      o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
      o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
        rq_direction() and rq_sync() functions which are used by CFQ and when using
        io-controller at a higher level, bio_* functions can be added.
      Signed-off-by: default avatarDivyesh <Shah&lt;dpshah@google.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
  14. 02 Apr, 2010 3 commits
  15. 12 Mar, 2010 1 commit
    • Ben Blum's avatar
      cgroups: blkio subsystem as module · 67523c48
      Ben Blum authored
      Modify the Block I/O cgroup subsystem to be able to be built as a module.
      As the CFQ disk scheduler optionally depends on blk-cgroup, config options
      in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
      enhanced to support the new module dependency.
      Signed-off-by: default avatarBen Blum <bblum@andrew.cmu.edu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  16. 26 Feb, 2010 1 commit
  17. 04 Dec, 2009 2 commits
  18. 03 Dec, 2009 1 commit