1. 09 Aug, 2013 5 commits
    • Tejun Heo's avatar
      cgroup: make css_for_each_descendant() and friends include the origin css in the iteration · bd8815a6
      Tejun Heo authored
      
      
      Previously, all css descendant iterators didn't include the origin
      (root of subtree) css in the iteration.  The reasons were maintaining
      consistency with css_for_each_child() and that at the time of
      introduction more use cases needed skipping the origin anyway;
      however, given that css_is_descendant() considers self to be a
      descendant, omitting the origin css has become more confusing and
      looking at the accumulated use cases rather clearly indicates that
      including origin would result in simpler code overall.
      
      While this is a change which can easily lead to subtle bugs, cgroup
      API including the iterators has recently gone through major
      restructuring and no out-of-tree changes will be applicable without
      adjustments making this a relatively acceptable opportunity for this
      type of change.
      
      The conversions are mostly straight-forward.  If the iteration block
      had explicit origin handling before or after, it's moved inside the
      iteration.  If not, if (pos == origin) continue; is added.  Some
      conversions add extra reference get/put around origin handling by
      consolidating origin handling and the rest.  While the extra ref
      operations aren't strictly necessary, this shouldn't cause any
      noticeable difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      bd8815a6
    • Tejun Heo's avatar
      cgroup: make hierarchy iterators deal with cgroup_subsys_state instead of cgroup · 492eb21b
      Tejun Heo authored
      
      
      cgroup is currently in the process of transitioning to using css
      (cgroup_subsys_state) as the primary handle instead of cgroup in
      subsystem API.  For hierarchy iterators, this is beneficial because
      
      * In most cases, css is the only thing subsystems care about anyway.
      
      * On the planned unified hierarchy, iterations for different
        subsystems will need to skip over different subtrees of the
        hierarchy depending on which subsystems are enabled on each cgroup.
        Passing around css makes it unnecessary to explicitly specify the
        subsystem in question as css is intersection between cgroup and
        subsystem
      
      * For the planned unified hierarchy, css's would need to be created
        and destroyed dynamically independent from cgroup hierarchy.  Having
        cgroup core manage css iteration makes enforcing deref rules a lot
        easier.
      
      Most subsystem conversions are straight-forward.  Noteworthy changes
      are
      
      * blkio: cgroup_to_blkcg() is no longer used.  Removed.
      
      * freezer: cgroup_freezer() is no longer used.  Removed.
      
      * devices: cgroup_to_devcgroup() is no longer used.  Removed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarAristeu Rozanski <aris@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      492eb21b
    • Tejun Heo's avatar
      cgroup: add css_parent() · 63876986
      Tejun Heo authored
      
      
      Currently, controllers have to explicitly follow the cgroup hierarchy
      to find the parent of a given css.  cgroup is moving towards using
      cgroup_subsys_state as the main controller interface construct, so
      let's provide a way to climb the hierarchy using just csses.
      
      This patch implements css_parent() which, given a css, returns its
      parent.  The function is guarnateed to valid non-NULL parent css as
      long as the target css is not at the top of the hierarchy.
      
      freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
      are converted to use css_parent() instead of accessing cgroup->parent
      directly.
      
      * __parent_ca() is dropped from cpuacct and its usage is replaced with
        parent_ca().  The only difference between the two was NULL test on
        cgroup->parent which is now embedded in css_parent() making the
        distinction moot.  Note that eventually a css->parent field will be
        added to css and the NULL check in css_parent() will go away.
      
      This patch shouldn't cause any behavior differences.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      63876986
    • Tejun Heo's avatar
      cgroup: add/update accessors which obtain subsys specific data from css · a7c6d554
      Tejun Heo authored
      
      
      css (cgroup_subsys_state) is usually embedded in a subsys specific
      data structure.  Subsystems either use container_of() directly to cast
      from css to such data structure or has an accessor function wrapping
      such cast.  As cgroup as whole is moving towards using css as the main
      interface handle, add and update such accessors to ease dealing with
      css's.
      
      All accessors explicitly handle NULL input and return NULL in those
      cases.  While this looks like an extra branch in the code, as all
      controllers specific data structures have css as the first field, the
      casting doesn't involve any offsetting and the compiler can trivially
      optimize out the branch.
      
      * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
        accessor.  Added.
      
      * memory, hugetlb and devices already had one but didn't explicitly
        handle NULL input.  Updated.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      a7c6d554
    • Tejun Heo's avatar
      cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ · 8af01f56
      Tejun Heo authored
      
      
      The names of the two struct cgroup_subsys_state accessors -
      cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
      The former clashes with the type name and the latter doesn't even
      indicate it's somehow related to cgroup.
      
      We're about to revamp large portion of cgroup API, so, let's rename
      them so that they're less awkward.  Most per-controller usages of the
      accessors are localized in accessor wrappers and given the amount of
      scheduled changes, this isn't gonna add any noticeable headache.
      
      Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
      to task_css().  This patch is pure rename.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      8af01f56
  2. 14 May, 2013 3 commits
  3. 04 Mar, 2013 1 commit
    • Li Zefan's avatar
      cgroup: fix cgroup_path() vs rename() race · 65dff759
      Li Zefan authored
      
      
      rename() will change dentry->d_name. The result of this race can
      be worse than seeing partially rewritten name, but we might access
      a stale pointer because rename() will re-allocate memory to hold
      a longer name.
      
      As accessing dentry->name must be protected by dentry->d_lock or
      parent inode's i_mutex, while on the other hand cgroup-path() can
      be called with some irq-safe spinlocks held, we can't generate
      cgroup path using dentry->d_name.
      
      Alternatively we make a copy of dentry->d_name and save it in
      cgrp->name when a cgroup is created, and update cgrp->name at
      rename().
      
      v5: use flexible array instead of zero-size array.
      v4: - allocate root_cgroup_name and all root_cgroup->name points to it.
          - add cgroup_name() wrapper.
      v3: use kfree_rcu() instead of synchronize_rcu() in user-visible path.
      v2: make cgrp->name RCU safe.
      
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      65dff759
  4. 09 Jan, 2013 6 commits
    • Tejun Heo's avatar
      blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge() · 16b3de66
      Tejun Heo authored
      
      
      Implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge().
      The former two collect the [rw]stats designated by the target policy
      data and offset from the pd's subtree.  The latter two add one
      [rw]stat to another.
      
      Note that the recursive sum functions require the queue lock to be
      held on entry to make blkg online test reliable.  This is necessary to
      properly handle stats of a dying blkg.
      
      These will be used to implement hierarchical stats.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      16b3de66
    • Tejun Heo's avatar
      blkcg: s/blkg_rwstat_sum()/blkg_rwstat_total()/ · 4d5e80a7
      Tejun Heo authored
      
      
      Rename blkg_rwstat_sum() to blkg_rwstat_total().  sum will be used for
      summing up stats from multiple blkgs.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      4d5e80a7
    • Tejun Heo's avatar
      blkcg: implement blkcg_policy->on/offline_pd_fn() and blkcg_gq->online · f427d909
      Tejun Heo authored
      
      
      Add two blkcg_policy methods, ->online_pd_fn() and ->offline_pd_fn(),
      which are invoked as the policy_data gets activated and deactivated
      while holding both blkcg and q locks.
      
      Also, add blkcg_gq->online bool, which is set and cleared as the
      blkcg_gq gets activated and deactivated.  This flag also is toggled
      while holding both blkcg and q locks.
      
      These will be used to implement hierarchical stats.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      f427d909
    • Tejun Heo's avatar
      blkcg: add blkg_policy_data->plid · b276a876
      Tejun Heo authored
      
      
      Add pd->plid so that the policy a pd belongs to can be identified
      easily.  This will be used to implement hierarchical blkg_[rw]stats.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      b276a876
    • Tejun Heo's avatar
      cfq-iosched: add leaf_weight · e71357e1
      Tejun Heo authored
      
      
      cfq blkcg is about to grow proper hierarchy handling, where a child
      blkg's weight would nest inside the parent's.  This makes tasks in a
      blkg to compete against both tasks in the sibling blkgs and the tasks
      of child blkgs.
      
      We're gonna use the existing weight as the group weight which decides
      the blkg's weight against its siblings.  This patch introduces a new
      weight - leaf_weight - which decides the weight of a blkg against the
      child blkgs.
      
      It's named leaf_weight because another way to look at it is that each
      internal blkg nodes have a hidden child leaf node which contains all
      its tasks and leaf_weight is the weight of the leaf node and handled
      the same as the weight of the child blkgs.
      
      This patch only adds leaf_weight fields and exposes it to userland.
      The new weight isn't actually used anywhere yet.  Note that
      cfq-iosched currently offcially supports only single level hierarchy
      and root blkgs compete with the first level blkgs - ie. root weight is
      basically being used as leaf_weight.  For root blkgs, the two weights
      are kept in sync for backward compatibility.
      
      v2: cfqd->root_group->leaf_weight initialization was missing from
          cfq_init_queue() causing divide by zero when
          !CONFIG_CFQ_GROUP_SCHED.  Fix it.  Reported by Fengguang.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      e71357e1
    • Tejun Heo's avatar
      blkcg: make blkcg_gq's hierarchical · 3c547865
      Tejun Heo authored
      
      
      Currently a child blkg (blkcg_gq) can be created even if its parent
      doesn't exist.  ie. Given a blkg, it's not guaranteed that its
      ancestors will exist.  This makes it difficult to implement proper
      hierarchy support for blkcg policies.
      
      Always create blkgs recursively and make a child blkg hold a reference
      to its parent.  blkg->parent is added so that finding the parent is
      easy.  blkcg_parent() is also added in the process.
      
      This change can be visible to userland.  e.g. while issuing IO in a
      nested cgroup didn't affect the ancestors at all, now it will
      initialize all ancestor blkgs and zero stats for the request_queue
      will always appear on them.  While this is userland visible, this
      shouldn't cause any functional difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      3c547865
  5. 26 Jun, 2012 1 commit
    • Tejun Heo's avatar
      blkcg: implement per-blkg request allocation · a051661c
      Tejun Heo authored
      Currently, request_queue has one request_list to allocate requests
      from regardless of blkcg of the IO being issued.  When the unified
      request pool is used up, cfq proportional IO limits become meaningless
      - whoever grabs the next request being freed wins the race regardless
      of the configured weights.
      
      This can be easily demonstrated by creating a blkio cgroup w/ very low
      weight, put a program which can issue a lot of random direct IOs there
      and running a sequential IO from a different cgroup.  As soon as the
      request pool is used up, the sequential IO bandwidth crashes.
      
      This patch implements per-blkg request_list.  Each blkg has its own
      request_list and any IO allocates its request from the matching blkg
      making blkcgs completely isolated in terms of request allocation.
      
      * Root blkcg uses the request_list embedded in each request_queue,
        which was renamed to @q->root_rl from @q->rq.  While making blkcg rl
        handling a bit harier, this enables avoiding most overhead for root
        blkcg.
      
      * Queue fullness is properly per request_list but bdi isn't blkcg
        aware yet, so congestion state currently just follows the root
        blkcg.  As writeback isn't aware of blkcg yet, this works okay for
        async congestion but readahead may get the wrong signals.  It's
        better than blkcg completely collapsing with shared request_list but
        needs to be improved with future changes.
      
      * After this change, each block cgroup gets a full request pool making
        resource consumption of each cgroup higher.  This makes allowing
        non-root users to create cgroups less desirable; however, note that
        allowing non-root users to directly manage cgroups is already
        severely broken regardless of this patch - each block cgroup
        consumes kernel memory and skews IO weight (IO weights are not
        hierarchical).
      
      v2: queue-sysfs.txt updated and patch description udpated as suggested
          by Vivek.
      
      v3: blk_get_rl() wasn't checking error return from
          blkg_lookup_create() and may cause oops on lookup failure.  Fix it
          by falling back to root_rl on blkg lookup failures.  This problem
          was spotted by Rakesh Iyer <rni@google.com>.
      
      v4: Updated to accomodate 458f27a9
      
       "block: Avoid missed wakeup in
          request waitqueue".  blk_drain_queue() now wakes up waiters on all
          blkg->rl on the target queue.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a051661c
  6. 25 Jun, 2012 1 commit
  7. 20 Apr, 2012 14 commits
    • Tejun Heo's avatar
      blkcg: use radix tree to index blkgs from blkcg · a637120e
      Tejun Heo authored
      
      
      blkg lookup is currently performed by traversing linked list anchored
      at blkcg->blkg_list.  This is very unscalable and with blk-throttle
      enabled and enough request queues on the system, this can get very
      ugly quickly (blk-throttle performs look up on every bio submission).
      
      This patch makes blkcg use radix tree to index blkgs combined with
      simple last-looked-up hint.  This is mostly identical to how icqs are
      indexed from ioc.
      
      Note that because __blkg_lookup() may be invoked without holding queue
      lock, hint is only updated from __blkg_lookup_create().  Due to cfq's
      cfqq caching, this makes hint updates overly lazy.  This will be
      improved with scheduled blkcg aware request allocation.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a637120e
    • Tejun Heo's avatar
      blkcg: collapse blkcg_policy_ops into blkcg_policy · f9fcc2d3
      Tejun Heo authored
      
      
      There's no reason to keep blkcg_policy_ops separate.  Collapse it into
      blkcg_policy.
      
      This patch doesn't introduce any functional change.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f9fcc2d3
    • Tejun Heo's avatar
      blkcg: embed struct blkg_policy_data in policy specific data · f95a04af
      Tejun Heo authored
      
      
      Currently blkg_policy_data carries policy specific data as char flex
      array instead of being embedded in policy specific data.  This was
      forced by oddities around blkg allocation which are all gone now.
      
      This patch makes blkg_policy_data embedded in policy specific data -
      throtl_grp and cfq_group so that it's more conventional and consistent
      with how io_cq is handled.
      
      * blkcg_policy->pdata_size is renamed to ->pd_size.
      
      * Functions which used to take void *pdata now takes struct
        blkg_policy_data *pd.
      
      * blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg().
      
      * Dummy struct blkg_policy_data definition added.  Dummy
        pdata_to_blkg() definition was unused and inconsistent with the
        non-dummy version - correct dummy pd_to_blkg() added.
      
      * throtl and cfq updated accordingly.
      
      * As dummy blkg_to_pd/pd_to_blkg() are provided,
        blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd.  Moved outside
        ifdef block.
      
      This patch doesn't introduce any functional change.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f95a04af
    • Tejun Heo's avatar
      blkcg: mass rename of blkcg API · 3c798398
      Tejun Heo authored
      
      
      During the recent blkcg cleanup, most of blkcg API has changed to such
      extent that mass renaming wouldn't cause any noticeable pain.  Take
      the chance and cleanup the naming.
      
      * Rename blkio_cgroup to blkcg.
      
      * Drop blkio / blkiocg prefixes and consistently use blkcg.
      
      * Rename blkio_group to blkcg_gq, which is consistent with io_cq but
        keep the blkg prefix / variable name.
      
      * Rename policy method type and field names to signify they're dealing
        with policy data.
      
      * Rename blkio_policy_type to blkcg_policy.
      
      This patch doesn't cause any functional change.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3c798398
    • Tejun Heo's avatar
      blkcg: style cleanups for blk-cgroup.h · 36558c8a
      Tejun Heo authored
      
      
      * Update indentation on struct field declarations.
      
      * Uniformly don't use "extern" on function declarations.
      
      * Merge the two #ifdef CONFIG_BLK_CGROUP blocks.
      
      All changes in this patch are cosmetic.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      36558c8a
    • Tejun Heo's avatar
      blkcg: remove blkio_group->path[] · 54e7ed12
      Tejun Heo authored
      
      
      blkio_group->path[] stores the path of the associated cgroup and is
      used only for debug messages.  Just format the path from blkg->cgroup
      when printing debug messages.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      54e7ed12
    • Tejun Heo's avatar
      blkcg: blkg_rwstat_read() was missing inline · c94bed89
      Tejun Heo authored
      
      
      blkg_rwstat_read() in blk-cgroup.h was missing inline modifier causing
      compile warning depending on configuration.  Add it.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      c94bed89
    • Tejun Heo's avatar
      blkcg: drop stuff unused after per-queue policy activation update · 3c96cb32
      Tejun Heo authored
      
      
      * All_q_list is unused.  Drop all_q_{mutex|list}.
      
      * @for_root of blkg_lookup_create() is always %false when called from
        outside blk-cgroup.c proper.  Factor out __blkg_lookup_create() so
        that it doesn't check whether @q is bypassing and use the
        underscored version for the @for_root callsite.
      
      * blkg_destroy_all() is used only from blkcg proper and @destroy_root
        is always %true.  Make it static and drop @destroy_root.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      3c96cb32
    • Tejun Heo's avatar
      blkcg: implement per-queue policy activation · a2b1693b
      Tejun Heo authored
      
      
      All blkcg policies were assumed to be enabled on all request_queues.
      Due to various implementation obstacles, during the recent blkcg core
      updates, this was temporarily implemented as shooting down all !root
      blkgs on elevator switch and policy [de]registration combined with
      half-broken in-place root blkg updates.  In addition to being buggy
      and racy, this meant losing all blkcg configurations across those
      events.
      
      Now that blkcg is cleaned up enough, this patch replaces the temporary
      implementation with proper per-queue policy activation.  Each blkcg
      policy should call the new blkcg_[de]activate_policy() to enable and
      disable the policy on a specific queue.  blkcg_activate_policy()
      allocates and installs policy data for the policy for all existing
      blkgs.  blkcg_deactivate_policy() does the reverse.  If a policy is
      not enabled for a given queue, blkg printing / config functions skip
      the respective blkg for the queue.
      
      blkcg_activate_policy() also takes care of root blkg creation, and
      cfq_init_queue() and blk_throtl_init() are updated accordingly.
      
      This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd()
      unnecessary.  Dropped.
      
      v2: cfq_init_queue() was returning uninitialized @ret on root_group
          alloc failure if !CONFIG_CFQ_GROUP_IOSCHED.  Fixed.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a2b1693b
    • Tejun Heo's avatar
      blkcg: make blkg_conf_prep() take @pol and return with queue lock held · da8b0662
      Tejun Heo authored
      
      
      Add @pol to blkg_conf_prep() and let it return with queue lock held
      (to be released by blkg_conf_finish()).  Note that @pol isn't used
      yet.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any visible difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      da8b0662
    • Tejun Heo's avatar
      blkcg: remove static policy ID enums · 8bd435b3
      Tejun Heo authored
      
      
      Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate
      @pol->plid dynamically on registration.  The maximum number of blkcg
      policies which can be registered at the same time is defined by
      BLKCG_MAX_POLS constant added to include/linux/blkdev.h.
      
      Note that blkio_policy_register() now may fail.  Policy init functions
      updated accordingly and unnecessary ifdefs removed from cfq_init().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      8bd435b3
    • Tejun Heo's avatar
      blkcg: use @pol instead of @plid in update_root_blkg_pd() and blkcg_print_blkgs() · ec399347
      Tejun Heo authored
      
      
      The two functions were taking "enum blkio_policy_id plid".  Make them
      take "const struct blkio_policy_type *pol" instead.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any functional difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      ec399347
    • Tejun Heo's avatar
      blkcg: kill blkio_list and replace blkio_list_lock with a mutex · bc0d6501
      Tejun Heo authored
      
      
      With blkio_policy[], blkio_list is redundant and hinders with
      per-queue policy activation.  Remove it.  Also, replace
      blkio_list_lock with a mutex blkcg_pol_mutex and let it protect the
      whole [un]registration.
      
      This is to prepare for per-queue policy activation and doesn't cause
      any functional difference.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      bc0d6501
    • Tejun Heo's avatar
      cfq: fix build breakage & warnings · f48ec1d7
      Tejun Heo authored
      
      
      * CFQ_WEIGHT_* defined inside CONFIG_BLK_CGROUP causes cfq-iosched.c
        compile failure when the config is disabled.  Move it outside the
        ifdef block.
      
      * Dummy cfqg_stats_*() definitions were lacking inline modifiers
        causing unused functions warning if !CONFIG_CFQ_GROUP_IOSCHED.  Add
        them.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      f48ec1d7
  8. 01 Apr, 2012 9 commits
    • Tejun Heo's avatar
      blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macros · 5bc4afb1
      Tejun Heo authored
      
      
      Now that all stat handling code lives in policy implementations,
      there's no need to encode policy ID in cft->private.
      
      * Export blkcg_prfill_[rw]stat() from blkcg, remove
        blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which
        use hard-code BLKIO_POLICY_PROP.
      
      * Use cft->private for offset of the target field directly and drop
        BLKCG_STAT_{PRIV|POL|OFF}().
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      5bc4afb1
    • Tejun Heo's avatar
      blkcg: pass around pd->pdata instead of pd itself in prfill functions · d366e7ec
      Tejun Heo authored
      
      
      Now that all conf and stat fields are moved into policy specific
      blkio_policy_data->pdata areas, there's no reason to use
      blkio_policy_data itself in prfill functions.  Pass around @pd->pdata
      instead of @pd.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      d366e7ec
    • Tejun Heo's avatar
      blkcg: move blkio_group_conf->iops and ->bps to blk-throttle · af133ceb
      Tejun Heo authored
      
      
      blkio_cgroup_conf->iops and ->bps are owned by blk-throttle and has no
      reason to be defined in blkcg core.  Drop them and let conf setting
      functions directly manipulate throtl_grp->bps[] and ->iops[].
      
      This makes blkio_group_conf empty.  Drop it.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      af133ceb
    • Tejun Heo's avatar
      blkcg: move blkio_group_conf->weight to cfq · 3381cb8d
      Tejun Heo authored
      
      
      blkio_group_conf->weight is owned by cfq and has no reason to be
      defined in blkcg core.  Replace it with cfq_group->dev_weight and let
      conf setting functions directly set it.  If dev_weight is zero, the
      cfqg doesn't have device specific weight configured.
      
      Also, rename BLKIO_WEIGHT_* constants to CFQ_WEIGHT_* and rename
      blkio_cgroup->weight to blkio_cgroup->cfq_weight.  We eventually want
      per-policy storage in blkio_cgroup but just mark the ownership of the
      field for now.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      3381cb8d
    • Tejun Heo's avatar
      blkcg: move blkio_group_stats_cpu and friends to blk-throttle.c · 8a3d2615
      Tejun Heo authored
      
      
      blkio_group_stats_cpu is used only by blk-throtl and has no reason to
      be defined in blkcg core.
      
      * Move blkio_group_stats_cpu to blk-throttle.c and rename it to
        tg_stats_cpu.
      
      * blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu.
        prfill functions updated accordingly.
      
      * All related macros / functions are renamed so that they have tg_
        prefix and the unnecessary @pol arguments are dropped.
      
      * Per-cpu stats allocation code is also moved from blk-cgroup.c to
        blk-throttle.c and gets simplified to only deal with
        BLKIO_POLICY_THROTL.  percpu stat free is performed by the exit
        method throtl_exit_blkio_group().
      
      * throtl_reset_group_stats() implemented for
        blkio_reset_group_stats_fn method so that tg->stats_cpu can be
        reset.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      8a3d2615
    • Tejun Heo's avatar
      blkcg: move blkio_group_stats to cfq-iosched.c · 155fead9
      Tejun Heo authored
      
      
      blkio_group_stats contains only fields used by cfq and has no reason
      to be defined in blkcg core.
      
      * Move blkio_group_stats to cfq-iosched.c and rename it to cfqg_stats.
      
      * blkg_policy_data->stats is replaced with cfq_group->stats.
        blkg_prfill_[rw]stat() are updated to use offset against pd->pdata
        instead.
      
      * All related macros / functions are renamed so that they have cfqg_
        prefix and the unnecessary @pol arguments are dropped.
      
      * All stat functions now take cfq_group * instead of blkio_group *.
      
      * lockdep assertion on queue lock dropped.  Elevator runs under queue
        lock by default.  There isn't much to be gained by adding lockdep
        assertions at stat function level.
      
      * cfqg_stats_reset() implemented for blkio_reset_group_stats_fn method
        so that cfqg->stats can be reset.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      155fead9
    • Tejun Heo's avatar
      blkcg: add blkio_policy_ops operations for exit and stat reset · 9ade5ea4
      Tejun Heo authored
      
      
      Add blkio_policy_ops->blkio_exit_group_fn() and
      ->blkio_reset_group_stats_fn().  These will be used to further
      modularize blkcg policy implementation.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      9ade5ea4
    • Tejun Heo's avatar
      blkcg: cfq doesn't need per-cpu dispatch stats · 41b38b6d
      Tejun Heo authored
      
      
      blkio_group_stats_cpu is used to count dispatch stats using per-cpu
      counters.  This is used by both blk-throtl and cfq-iosched but the
      sharing is rather silly.
      
      * cfq-iosched doesn't need per-cpu dispatch stats.  cfq always updates
        those stats while holding queue_lock.
      
      * blk-throtl needs per-cpu dispatch stats but only service_bytes and
        serviced.  It doesn't make use of sectors.
      
      This patch makes cfq add and use global stats for service_bytes,
      serviced and sectors, removes per-cpu sectors counter and moves
      per-cpu stat printing code to blk-throttle.c.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      41b38b6d
    • Tejun Heo's avatar
      blkcg: move statistics update code to policies · 629ed0b1
      Tejun Heo authored
      
      
      As with conf/stats file handling code, there's no reason for stat
      update code to live in blkcg core with policies calling into update
      them.  The current organization is both inflexible and complex.
      
      This patch moves stat update code to specific policies.  All
      blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP
      stats are collapsed into their cfq_blkiocg_update_*_stats()
      counterparts.  blkiocg_update_dispatch_stats() is used by both
      policies and duplicated as throtl_update_dispatch_stats() and
      cfq_blkiocg_update_dispatch_stats().  This will be cleaned up later.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      629ed0b1