1. 12 Jan, 2014 1 commit
    • Rik van Riel's avatar
      sched: Calculate effective load even if local weight is 0 · 9722c2da
      Rik van Riel authored
      Thomas Hellstrom bisected a regression where erratic 3D performance is
      experienced on virtual machines as measured by glxgears. It identified
      commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
      node") as the problem which had modified the behaviour of effective_load.
      
      Effective load calculates the difference to the system-wide load if a
      scheduling entity was moved to another CPU. The task group is not heavier
      as a result of the move but overall system load can increase/decrease as a
      result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
      on a preferred NUMA node") changed effective_load to make it suitable for
      calculating if a particular NUMA node was compute overloaded. To reduce
      the cost of the function, it assumed that a current sched entity weight
      of 0 was uninteresting but that is not the case.
      
      wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
      is assuming the waking task will sleep and not contribute to load in the
      near future. In this case, we still want to calculate the effective load
      of the sched entity hierarchy. As effective_load is no longer used by
      task_numa_compare since commit fb13c7ee
      
       (sched/numa: Use a system-wide
      search to find swap/migration candidates), this patch simply restores the
      historical behaviour.
      Reported-and-tested-by: default avatarThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      [ Wrote changelog]
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.de
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9722c2da
  2. 19 Dec, 2013 1 commit
  3. 11 Dec, 2013 1 commit
    • Peter Zijlstra's avatar
      sched/fair: Rework sched_fair time accounting · 9dbdb155
      Peter Zijlstra authored
      
      
      Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
      results in him occasionally seeing time going backwards - which
      crashes the scheduler ...
      
      Most of our time accounting can actually handle that except the most
      common one; the tick time update of sched_fair.
      
      There is a further problem with that code; previously we assumed that
      because we get a tick every TICK_NSEC our time delta could never
      exceed 32bits and math was simpler.
      
      However, ever since Frederic managed to get NO_HZ_FULL merged; this is
      no longer the case since now a task can run for a long time indeed
      without getting a tick. It only takes about ~4.2 seconds to overflow
      our u32 in nanoseconds.
      
      This means we not only need to better deal with time going backwards;
      but also means we need to be able to deal with large deltas.
      
      This patch reworks the entire code and uses mul_u64_u32_shr() as
      proposed by Andy a long while ago.
      
      We express our virtual time scale factor in a u32 multiplier and shift
      right and the 32bit mul_u64_u32_shr() implementation reduces to a
      single 32x32->64 multiply if the time delta is still short (common
      case).
      
      For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128.
      Reported-and-Tested-by: default avatarChristian Engelmayer <cengelma@gmx.at>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      Cc: Paul Turner <pjt@google.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.net
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      9dbdb155
  4. 19 Nov, 2013 1 commit
  5. 13 Nov, 2013 3 commits
  6. 06 Nov, 2013 2 commits
    • Preeti U Murthy's avatar
      sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus · 37dc6b50
      Preeti U Murthy authored
      
      
      nr_busy_cpus parameter is used by nohz_kick_needed() to find out the
      number of busy cpus in a sched domain which has SD_SHARE_PKG_RESOURCES
      flag set.  Therefore instead of updating nr_busy_cpus at every level
      of sched domain, since it is irrelevant, we can update this parameter
      only at the parent domain of the sd which has this flag set. Introduce
      a per-cpu parameter sd_busy which represents this parent domain.
      
      In nohz_kick_needed() we directly query the nr_busy_cpus parameter
      associated with the groups of sd_busy.
      
      By associating sd_busy with the highest domain which has
      SD_SHARE_PKG_RESOURCES flag set, we cover all lower level domains
      which could have this flag set and trigger nohz_idle_balancing if any
      of the levels have more than one busy cpu.
      
      sd_busy is irrelevant for asymmetric load balancing. However sd_asym
      has been introduced to represent the highest sched domain which has
      SD_ASYM_PACKING flag set so that it can be queried directly when
      required.
      
      While we are at it, we might as well change the nohz_idle parameter to
      be updated at the sd_busy domain level alone and not the base domain
      level of a CPU.  This will unify the concept of busy cpus at just one
      level of sched domain where it is currently used.
      
      Signed-off-by: Preeti U Murthy<preeti@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: svaidy@linux.vnet.ibm.com
      Cc: vincent.guittot@linaro.org
      Cc: bitbucket@online.de
      Cc: benh@kernel.crashing.org
      Cc: anton@samba.org
      Cc: Morten.Rasmussen@arm.com
      Cc: pjt@google.com
      Cc: peterz@infradead.org
      Cc: mikey@neuling.org
      Link: http://lkml.kernel.org/r/20131030031252.23426.4417.stgit@preeti.in.ibm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      37dc6b50
    • Vaidyanathan Srinivasan's avatar
      sched: Fix asymmetric scheduling for POWER7 · 2042abe7
      Vaidyanathan Srinivasan authored
      
      
      Asymmetric scheduling within a core is a scheduler loadbalancing
      feature that is triggered when SD_ASYM_PACKING flag is set.  The goal
      for the load balancer is to move tasks to lower order idle SMT threads
      within a core on a POWER7 system.
      
      In nohz_kick_needed(), we intend to check if our sched domain (core)
      is completely busy or we have idle cpu.
      
      The following check for SD_ASYM_PACKING:
      
          (cpumask_first_and(nohz.idle_cpus_mask, sched_domain_span(sd)) < cpu)
      
      already covers the case of checking if the domain has an idle cpu,
      because cpumask_first_and() will not yield any set bits if this domain
      has no idle cpu.
      
      Hence, nr_busy check against group weight can be removed.
      Reported-by: default avatarMichael Neuling <michael.neuling@au1.ibm.com>
      Signed-off-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: default avatarPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: default avatarMichael Neuling <mikey@neuling.org>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: vincent.guittot@linaro.org
      Cc: bitbucket@online.de
      Cc: benh@kernel.crashing.org
      Cc: anton@samba.org
      Cc: Morten.Rasmussen@arm.com
      Cc: pjt@google.com
      Link: http://lkml.kernel.org/r/20131030031242.23426.13019.stgit@preeti.in.ibm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2042abe7
  7. 29 Oct, 2013 5 commits
  8. 16 Oct, 2013 1 commit
    • Peter Zijlstra's avatar
      sched: Fix race in migrate_swap_stop() · 74602315
      Peter Zijlstra authored
      
      
      There is a subtle race in migrate_swap, when task P, on CPU A, decides to swap
      places with task T, on CPU B.
      
      Task P:
        - call migrate_swap
      Task T:
        - go to sleep, removing itself from the runqueue
      Task P:
        - double lock the runqueues on CPU A & B
      Task T:
        - get woken up, place itself on the runqueue of CPU C
      Task P:
        - see that task T is on a runqueue, and pretend to remove it
          from the runqueue on CPU B
      
      Now CPUs B & C both have corrupted scheduler data structures.
      
      This patch fixes it, by holding the pi_lock for both of the tasks
      involved in the migrate swap. This prevents task T from waking up,
      and placing itself onto another runqueue, until after migrate_swap
      has released all locks.
      
      This means that, when migrate_swap checks, task T will be either
      on the runqueue where it was originally seen, or not on any
      runqueue at all. Migrate_swap deals correctly with of those cases.
      Tested-by: default avatarJoe Mario <jmario@redhat.com>
      Acked-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: hannes@cmpxchg.org
      Cc: aarcange@redhat.com
      Cc: srikar@linux.vnet.ibm.com
      Cc: tglx@linutronix.de
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/20131010181722.GO13848@laptop.programming.kicks-ass.net
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      74602315
  9. 14 Oct, 2013 1 commit
  10. 12 Oct, 2013 1 commit
  11. 09 Oct, 2013 23 commits