1. 18 Aug, 2015 17 commits
    • Tejun Heo's avatar
      blkcg: move io_service_bytes and io_serviced stats into blkcg_gq · 77ea7338
      Tejun Heo authored
      
      
      Currently, both cfq-iosched and blk-throttle keep track of
      io_service_bytes and io_serviced stats.  While keeping track of them
      separately may be useful during development, it doesn't make much
      sense otherwise.  Also, blk-throttle was counting bio's as IOs while
      cfq-iosched request's, which is more confusing than informative.
      
      This patch adds ->stat_bytes and ->stat_ios to blkg (blkcg_gq),
      removes the counterparts from cfq-iosched and blk-throttle and let
      them print from the common blkg counters.  The common counters are
      incremented during bio issue in blkcg_bio_issue_check().
      
      The outputs are still filtered by whether the policy has
      blkg_policy_data on a given blkg, so cfq's output won't show up if it
      has never been used for a given blkg.  The only times when the outputs
      would differ significantly are when policies are attached on the fly
      or elevators are switched back and forth.  Those are quite exceptional
      operations and I don't think they warrant keeping separate counters.
      
      v3: Update blkio-controller.txt accordingly.
      
      v2: Account IOs during bio issues instead of request completions so
          that bio-based drivers can be handled the same way.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      77ea7338
    • Tejun Heo's avatar
      blkcg: make blkg_[rw]stat_recursive_sum() to be able to index into blkcg_gq · f12c74ca
      Tejun Heo authored
      
      
      Currently, blkg_[rw]stat_recursive_sum() assume that the target
      counter is located in pd (blkg_policy_data); however, some counters
      are planned to be moved to blkg (blkcg_gq).
      
      This patch updates blkg_[rw]stat_recursive_sum() to take blkg and
      blkg_policy pointers instead of pd.  If policy is NULL, it indexes
      into blkg.  If non-NULL, into the blkg's pd of the policy.
      
      The existing usages are updated to maintain the current behaviors.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      f12c74ca
    • Tejun Heo's avatar
      blkcg: make blkcg_[rw]stat per-cpu · 24bdb8ef
      Tejun Heo authored
      
      
      blkcg_[rw]stat are used as stat counters for blkcg policies.  It isn't
      per-cpu by itself and blk-throttle makes it per-cpu by wrapping around
      it.  This patch makes blkcg_[rw]stat per-cpu and drop the ad-hoc
      per-cpu wrapping in blk-throttle.
      
      * blkg_[rw]stat->cnt is replaced with cpu_cnt which is struct
        percpu_counter.  This makes syncp unnecessary as remote accesses are
        handled by percpu_counter itself.
      
      * blkg_[rw]stat_init() can now fail due to percpu allocation failure
        and thus are updated to return int.
      
      * percpu_counters need explicit freeing.  blkg_[rw]stat_exit() added.
      
      * As blkg_rwstat->cpu_cnt[] can't be read directly anymore, reading
        and summing results are stored in ->aux_cnt[] instead.
      
      * Custom per-cpu stat implementation in blk-throttle is removed.
      
      This makes all blkcg stat counters per-cpu without complicating policy
      implmentations.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      24bdb8ef
    • Tejun Heo's avatar
      blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it · e6269c44
      Tejun Heo authored
      
      
      cgroup stats are local to each cgroup and doesn't propagate to
      ancestors by default.  When recursive stats are necessary, the sum is
      calculated over all the descendants.  This initially was for backward
      compatibility to support both group-local and recursive stats but this
      mode of operation makes general sense as stat update is much hotter
      thafn reporting those stats.
      
      This however ends up losing recursive stats when a child is removed.
      To work around this, cfq-iosched adds its stats to its parent
      cfq_group->dead_stats which is summed up together when calculating
      recursive stats.
      
      It's planned that the core stats will be moved to blkcg_gq, so we want
      to move the mechanism for keeping track of the stats of dead children
      from cfq to blkcg core.  This patch adds blkg_[rw]stat->aux_cnt which
      are atomic64_t's keeping track of auxiliary counts which are excluded
      when reading local counts but included for recursive.
      
      blkg_[rw]stat_merge() which were used by cfq to implement dead_stats
      are replaced by blkg_[rw]stat_add_aux(), and cfq now forwards stats of
      a dead cgroup to the aux counts of parent->stats instead of separate
      ->dead_stats.
      
      This will also help making blkg_[rw]stats per-cpu.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e6269c44
    • Tejun Heo's avatar
      blkcg: consolidate blkg creation in blkcg_bio_issue_check() · ae118896
      Tejun Heo authored
      
      
      blkg (blkcg_gq) currently is created by blkcg policies invoking
      blkg_lookup_create() which ends up repeating about the same code in
      different policies.  Theoretically, this can avoid the overhead of
      looking and/or creating blkg's if blkcg is enabled but no policy is in
      use; however, the cost of blkg lookup / creation is very low
      especially if only the root blkcg is in use which is highly likely if
      no blkcg policy is in active use - it boils down to a single very
      predictable conditional and surrounding RCU protection.
      
      This patch consolidates blkg creation to a new function
      blkcg_bio_issue_check() which is called during bio issue from
      generic_make_request_checks().  blkcg_bio_issue_check() is now the
      only function which tries to create missing blkg's.  The subsequent
      policy and request_list operations just perform blkg_lookup() and if
      missing falls back to the root.
      
      * blk_get_rl() no longer tries to create blkg.  It uses blkg_lookup()
        instead of blkg_lookup_create().
      
      * blk_throtl_bio() is now called from blkcg_bio_issue_check() with rcu
        read locked and blkg already looked up.  Both throtl_lookup_tg() and
        throtl_lookup_create_tg() are dropped.
      
      * cfq is similarly updated.  cfq_lookup_create_cfqg() is replaced with
        cfq_lookup_cfqg()which uses blkg_lookup().
      
      This consolidates blkg handling and avoids unnecessary blkg creation
      retries under memory pressure.  In addition, this provides a common
      bio entry point into blkcg where things like common accounting can be
      performed.
      
      v2: Build fixes for !CONFIG_CFQ_GROUP_IOSCHED and
          !CONFIG_BLK_DEV_THROTTLING.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      ae118896
    • Tejun Heo's avatar
      blkcg: move root blkg lookup optimization from throtl_lookup_tg() to __blkg_lookup() · 85b6bc9d
      Tejun Heo authored
      
      
      Currently, both throttle and cfq policies implement their own root
      blkg (blkcg_gq) lookup fast path.  This patch moves root blkg
      optimization from throtl_lookup_tg() to __blkg_lookup().  cfq-iosched
      currently doesn't use blkg_lookup() but will be converted and drop the
      optimization too.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      85b6bc9d
    • Tejun Heo's avatar
      blkcg: inline [__]blkg_lookup() · 24f29046
      Tejun Heo authored
      
      
      blkg_lookup() checks whether the target queue is bypassing and, if
      not, calls __blkg_lookup() which first checks the lookup hint and then
      performs radix tree walk.  The operations upto hint checking are
      trivial and there are many users of this function.  This patch inlines
      blkg_lookup() and the fast path part of __blkg_lookup().  The radix
      tree lookup and hint update are now in blkg_lookup_slowpath().
      
      This will help consolidating blkg handling by easing moving root blkcg
      short-circuit to inlined lookup fast path.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      24f29046
    • Tejun Heo's avatar
      blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods · e4a9bde9
      Tejun Heo authored
      
      
      Each active policy has a cpd (blkcg_policy_data) on each blkcg.  The
      cpd's were allocated by blkcg core and each policy could request to
      allocate extra space at the end by setting blkcg_policy->cpd_size
      larger than the size of cpd.
      
      This is a bit unusual but blkg (blkcg_gq) policy data used to be
      handled this way too so it made sense to be consistent; however, blkg
      policy data switched to alloc/free callbacks.
      
      This patch makes similar changes to cpd handling.
      blkcg_policy->cpd_alloc/free_fn() are added to replace ->cpd_size.  As
      cpd allocation is now done from policy side, it can simply allocate a
      larger area which embeds cpd at the beginning.
      
      As ->cpd_alloc_fn() may be able to perform all necessary
      initializations, this patch makes ->cpd_init_fn() optional.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      e4a9bde9
    • Tejun Heo's avatar
      blkcg: minor updates around blkcg_policy_data · 81437648
      Tejun Heo authored
      
      
      * Rename blkcg->pd[] to blkcg->cpd[] so that cpd is consistently used
        for blkcg_policy_data.
      
      * Make blkcg_policy->cpd_init_fn() take blkcg_policy_data instead of
        blkcg.  This makes it consistent with blkg_policy_data methods and
        to-be-added cpd alloc/free methods.
      
      * blkcg_policy_data->blkcg and cpd_to_blkcg() added so that
        cpd_init_fn() can determine the associated blkcg from
        blkcg_policy_data.
      
      v2: blkcg_policy_data->blkcg initializations were missing.  Added.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      81437648
    • Tejun Heo's avatar
      blkcg: make blkcg_policy methods take a pointer to blkcg_policy_data · a9520cd6
      Tejun Heo authored
      
      
      The newly added ->pd_alloc_fn() and ->pd_free_fn() deal with pd
      (blkg_policy_data) while the older ones use blkg (blkcg_gq).  As using
      blkg doesn't make sense for ->pd_alloc_fn() and after allocation pd
      can always be mapped to blkg and given that these are policy-specific
      methods, it makes sense to converge on pd.
      
      This patch makes all methods deal with pd instead of blkg.  Most
      conversions are trivial.  In blk-cgroup.c, a couple method invocation
      sites now test whether pd exists instead of policy state for
      consistency.  This shouldn't cause any behavioral differences.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      a9520cd6
    • Tejun Heo's avatar
      blk-throttle: clean up blkg_policy_data alloc/init/exit/free methods · b2ce2643
      Tejun Heo authored
      
      
      With the recent addition of alloc and free methods, things became
      messier.  This patch reorganizes them according to the followings.
      
      * ->pd_alloc_fn()
      
        Responsible for allocation and static initializations - the ones
        which can be done independent of where the pd might be attached.
      
      * ->pd_init_fn()
      
        Initializations which require the knowledge of where the pd is
        attached.
      
      * ->pd_free_fn()
      
        The counter part of pd_alloc_fn().  Static de-init and freeing.
      
      This leaves ->pd_exit_fn() without any users.  Removed.
      
      While at it, collapse an one liner function throtl_pd_exit(), which
      has only one user, into its user.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      b2ce2643
    • Tejun Heo's avatar
      blkcg: replace blkcg_policy->pd_size with ->pd_alloc/free_fn() methods · 001bea73
      Tejun Heo authored
      
      
      A blkg (blkcg_gq) represents the relationship between a cgroup and
      request_queue.  Each active policy has a pd (blkg_policy_data) on each
      blkg.  The pd's were allocated by blkcg core and each policy could
      request to allocate extra space at the end by setting
      blkcg_policy->pd_size larger than the size of pd.
      
      This is a bit unusual but was done this way mostly to simplify error
      handling and all the existing use cases could be handled this way;
      however, this is becoming too restrictive now that percpu memory can
      be allocated without blocking.
      
      This introduces two new mandatory blkcg_policy methods - pd_alloc_fn()
      and pd_free_fn() - which are used to allocate and release pd for a
      given policy.  As pd allocation is now done from policy side, it can
      simply allocate a larger area which embeds pd at the beginning.  This
      change makes ->pd_size pointless.  Removed.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      001bea73
    • Tejun Heo's avatar
      blkcg: restructure blkg_policy_data allocation in blkcg_activate_policy() · 4c55f4f9
      Tejun Heo authored
      
      
      When a policy gets activated, it needs to allocate and install its
      policy data on all existing blkg's (blkcg_gq's).  Because blkg
      iteration is protected by a spinlock, it currently counts the total
      number of blkg's in the system, allocates the matching number of
      policy data on a list and installs them during a single iteration.
      
      This can be simplified by using speculative GFP_NOWAIT allocations
      while iterating and falling back to a preallocated policy data on
      failure.  If the preallocated one has already been consumed, it
      releases the lock, preallocate with GFP_KERNEL and then restarts the
      iteration.  This can be a bit more expensive than before but policy
      activation is a very cold path and shouldn't matter.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      4c55f4f9
    • Tejun Heo's avatar
      blkcg: remove unnecessary request_list->blkg NULL test in blk_put_rl() · 401efbf8
      Tejun Heo authored
      Since ec13b1d6
      
       ("blkcg: always create the blkcg_gq for the root
      blkcg"), a request_list always has its blkg associated.  Drop
      unnecessary rl->blkg NULL test from blk_put_rl().
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      401efbf8
    • Tejun Heo's avatar
      kernfs: implement kernfs_path_len() · 9acee9c5
      Tejun Heo authored
      
      
      Add a function to determine the path length of a kernfs node.  This
      for now will be used by writeback tracepoint updates.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      9acee9c5
    • Tejun Heo's avatar
      writeback: bdi_for_each_wb() iteration is memcg ID based not blkcg · 1ed8d48c
      Tejun Heo authored
      
      
      wb's (bdi_writeback's) are currently keyed by memcg ID; however, in an
      earlier implementation, wb's were keyed by blkcg ID.
      bdi_for_each_wb() walks bdi->cgwb_tree in the ascending ID order and
      allows iterations to start from an arbitrary ID which is used to
      interrupt and resume iterations.
      
      Unfortunately, while changing wb to be keyed by memcg ID instead of
      blkcg, bdi_for_each_wb() was missed and is still assuming that wb's
      are keyed by blkcg ID.  This doesn't affect iterations which don't get
      interrupted but bdi_split_work_to_wbs() makes use of iteration
      resuming on allocation failures and thus may incorrectly skip or
      repeat wb's.
      
      Fix it by changing bdi_for_each_wb() to take memcg IDs instead of
      blkcg IDs and updating bdi_split_work_to_wbs() accordingly.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      1ed8d48c
    • Tejun Heo's avatar
      cgroup: introduce cgroup_subsys->legacy_name · 3e1d2eed
      Tejun Heo authored
      
      
      This allows cgroup subsystems to use a different name on the unified
      hierarchy.  cgroup_subsys->name is used on the unified hierarchy,
      ->legacy_name elsewhere.  If ->legacy_name is not explicitly set, it's
      automatically set to ->name and the userland visible behavior remains
      unchanged.
      
      v2: Make parse_cgroupfs_options() only consider ->legacy_name as mount
          options are used only on legacy hierarchies.  Suggested by Li
          Zefan.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: cgroups@vger.kernel.org
      3e1d2eed
  2. 11 Aug, 2015 1 commit
  3. 04 Aug, 2015 1 commit
    • Tejun Heo's avatar
      cgroup: define controller file conventions · 6abc8ca1
      Tejun Heo authored
      
      
      Traditionally, each cgroup controller implemented whatever interface
      it wanted leading to interfaces which are widely inconsistent.
      Examining the requirements of the controllers readily yield that there
      are only a few control schemes shared among all.
      
      Two major controllers already had to implement new interface for the
      unified hierarchy due to significant structural changes.  Let's take
      the chance to establish common conventions throughout all controllers.
      
      This patch defines CGROUP_WEIGHT_MIN/DFL/MAX to be used on all weight
      based control knobs and documents the conventions that controllers
      should follow on the unified hierarchy.  Except for io.weight knob,
      all existing unified hierarchy knobs are already compliant.  A
      follow-up patch will update io.weight.
      
      v2: Added descriptions of min, low and high knobs.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      6abc8ca1
  4. 24 Jul, 2015 2 commits
  5. 19 Jul, 2015 1 commit
    • Srinivas Pandruvada's avatar
      hid-sensor: Fix suspend/resume delay · 88cc7b4e
      Srinivas Pandruvada authored
      
      
      By default all the sensors are runtime suspended state (lowest power
      state). During Linux suspend process, all the run time suspended
      devices are resumed and then suspended. This caused all sensors to
      power up and introduced delay in suspend time, when we introduced
      runtime PM for HID sensors. The opposite process happens during resume
      process.
      
      To fix this, we do powerup process of the sensors only when the request
      is issued from user (raw or tiggerred). In this way when runtime,
      resume calls for powerup it will simply return as this will not match
      user requested state.
      
      Note this is a regression fix as the increase in suspend / resume
      times can be substantial (report of 8 seconds on Len's laptop!)
      Signed-off-by: default avatarSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: default avatarLen Brown <len.brown@intel.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: default avatarJonathan Cameron <jic23@kernel.org>
      88cc7b4e
  6. 18 Jul, 2015 2 commits
  7. 17 Jul, 2015 4 commits
  8. 15 Jul, 2015 4 commits
  9. 13 Jul, 2015 2 commits
  10. 12 Jul, 2015 1 commit
    • Oliver Hartkopp's avatar
      can: replace timestamp as unique skb attribute · d3b58c47
      Oliver Hartkopp authored
      Commit 514ac99c "can: fix multiple delivery of a single CAN frame for
      overlapping CAN filters" requires the skb->tstamp to be set to check for
      identical CAN skbs.
      
      Without timestamping to be required by user space applications this timestamp
      was not generated which lead to commit 36c01245
      
       "can: fix loss of CAN frames
      in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
      by introducing several __net_timestamp() calls.
      
      This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
      to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
      in mainline Linux.
      
      This patch removes the timestamp dependency and uses an atomic counter to
      create an unique identifier together with the skbuff pointer.
      
      Btw: the new skbcnt element introduced in struct can_skb_priv has to be
      initialized with zero in out-of-tree drivers which are not using
      alloc_can{,fd}_skb() too.
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      d3b58c47
  11. 10 Jul, 2015 1 commit
  12. 09 Jul, 2015 4 commits
    • Enrico Mioso's avatar
      cdc_ncm: Add support for moving NDP to end of NCM frame · 4a0e3e98
      Enrico Mioso authored
      
      
      NCM specs are not actually mandating a specific position in the frame for
      the NDP (Network Datagram Pointer). However, some Huawei devices will
      ignore our aggregates if it is not placed after the datagrams it points
      to. Add support for doing just this, in a per-device configurable way.
      While at it, update NCM subdrivers, disabling this functionality in all of
      them, except in huawei_cdc_ncm where it is enabled instead.
      We aren't making any distinction between different Huawei NCM devices,
      based on what the vendor driver does. Standard NCM devices are left
      unaffected: if they are compliant, they should be always usable, still
      stay on the safe side.
      
      This change has been tested and working with a Huawei E3131 device (which
      works regardless of NDP position), a Huawei E3531 (also working both
      ways) and an E3372 (which mandates NDP to be after indexed datagrams).
      
      V1->V2:
      - corrected wrong NDP acronym definition
      - fixed possible NULL pointer dereference
      - patch cleanup
      V2->V3:
      - Properly account for the NDP size when writing new packets to SKB
      Signed-off-by: default avatarEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4a0e3e98
    • Tejun Heo's avatar
      blkcg: fix blkcg_policy_data allocation bug · 06b285bd
      Tejun Heo authored
      e48453c3
      
       ("block, cgroup: implement policy-specific per-blkcg
      data") updated per-blkcg policy data to be dynamically allocated.
      When a policy is registered, its policy data aren't created.  Instead,
      when the policy is activated on a queue, the policy data are allocated
      if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
      This is buggy.  Consider the following scenario.
      
      1. A blkcg is created.  No blkg's attached yet.
      
      2. The policy is registered.  No policy data is allocated.
      
      3. The policy is activated on a queue.  As the above blkcg doesn't
         have any blkg's, it won't allocate the matching blkcg_policy_data.
      
      4. An IO is issued from the blkcg and blkg is created and the blkcg
         still doesn't have the matching policy data allocated.
      
      With cfq-iosched, this leads to an oops.
      
      It also doesn't free policy data on policy unregistration assuming
      that freeing of all policy data on blkcg destruction should take care
      of it; however, this also is incorrect.
      
      1. A blkcg has policy data.
      
      2. The policy gets unregistered but the policy data remains.
      
      3. Another policy gets registered on the same slot.
      
      4. Later, the new policy tries to allocate policy data on the previous
         blkcg but the slot is already occupied and gets skipped.  The
         policy ends up operating on the policy data of the previous policy.
      
      There's no reason to manage blkcg_policy_data lazily.  The reason we
      do lazy allocation of blkg's is that the number of all possible blkg's
      is the product of cgroups and block devices which can reach a
      surprising level.  blkcg_policy_data is contrained by the number of
      cgroups and shouldn't be a problem.
      
      This patch makes blkcg_policy_data to be allocated for all existing
      blkcg's on policy registration and freed on unregistration and removes
      blkcg_policy_data handling from policy [de]activation paths.  This
      makes that blkcg_policy_data are created and removed with the policy
      they belong to and fixes the above described problems.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: e48453c3
      
       ("block, cgroup: implement policy-specific per-blkcg data")
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      06b285bd
    • Tejun Heo's avatar
      blkcg: implement all_blkcgs list · 7876f930
      Tejun Heo authored
      
      
      Add all_blkcgs list goes through blkcg->all_blkcgs_node and is
      protected by blkcg_pol_mutex.  This will be used to fix
      blkcg_policy_data allocation bug.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@fb.com>
      7876f930
    • Ilya Dryomov's avatar
      libceph: enable ceph in a non-default network namespace · 757856d2
      Ilya Dryomov authored
      
      
      Grab a reference on a network namespace of the 'rbd map' (in case of
      rbd) or 'mount' (in case of ceph) process and use that to open sockets
      instead of always using init_net and bailing if network namespace is
      anything but init_net.  Be careful to not share struct ceph_client
      instances between different namespaces and don't add any code in the
      !CONFIG_NET_NS case.
      
      This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: default avatarSage Weil <sage@redhat.com>
      757856d2