1. 15 Nov, 2009 1 commit
    • Tejun Heo's avatar
      sched, kvm: Fix race condition involving sched_in_preempt_notifers · 498657a4
      Tejun Heo authored
      
      
      In finish_task_switch(), fire_sched_in_preempt_notifiers() is
      called after finish_lock_switch().
      
      However, depending on architecture, preemption can be enabled after
      finish_lock_switch() which breaks the semantics of preempt
      notifiers.
      
      So move it before finish_arch_switch(). This also makes the in-
      notifiers symmetric to out- notifiers in terms of locking - now
      both are called under rq lock.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarAvi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4AFD2801.7020900@kernel.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      498657a4
  2. 12 Nov, 2009 1 commit
  3. 08 Nov, 2009 2 commits
  4. 03 Nov, 2009 1 commit
  5. 02 Nov, 2009 1 commit
  6. 28 Oct, 2009 1 commit
    • Jiri Kosina's avatar
      sched: move rq_weight data array out of .percpu · 4a6cc4bd
      Jiri Kosina authored
      Commit 34d76c41
      
       introduced percpu array update_shares_data, size of which
      being proportional to NR_CPUS. Unfortunately this blows up ia64 for large
      NR_CPUS configuration, as ia64 allows only 64k for .percpu section.
      
      Fix this by allocating this array dynamically and keep only pointer to it
      percpu.
      
      The per-cpu handling doesn't impose significant performance penalty on
      potentially contented path in tg_shares_up().
      
      ...
      ffffffff8104337c:       65 48 8b 14 25 20 cd    mov    %gs:0xcd20,%rdx
      ffffffff81043383:       00 00
      ffffffff81043385:       48 c7 c0 00 e1 00 00    mov    $0xe100,%rax
      ffffffff8104338c:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff81043393:       00
      ffffffff81043394:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff8104339b:       00
      ffffffff8104339c:       48 01 d0                add    %rdx,%rax
      ffffffff8104339f:       49 8d 94 24 08 01 00    lea    0x108(%r12),%rdx
      ffffffff810433a6:       00
      ffffffff810433a7:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433ac:       48 89 45 b0             mov    %rax,-0x50(%rbp)
      ffffffff810433b0:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433b5:       48 89 55 c0             mov    %rdx,-0x40(%rbp)
      ...
      
      After:
      
      ...
      ffffffff8104337c:       65 8b 04 25 28 cd 00    mov    %gs:0xcd28,%eax
      ffffffff81043383:       00
      ffffffff81043384:       48 98                   cltq
      ffffffff81043386:       49 8d bc 24 08 01 00    lea    0x108(%r12),%rdi
      ffffffff8104338d:       00
      ffffffff8104338e:       48 8b 15 d3 7f 76 00    mov    0x767fd3(%rip),%rdx        # ffffffff817ab368 <update_shares_data>
      ffffffff81043395:       48 8b 34 c5 00 ee 6d    mov    -0x7e921200(,%rax,8),%rsi
      ffffffff8104339c:       81
      ffffffff8104339d:       48 c7 45 a0 00 00 00    movq   $0x0,-0x60(%rbp)
      ffffffff810433a4:       00
      ffffffff810433a5:       b9 ff ff ff ff          mov    $0xffffffff,%ecx
      ffffffff810433aa:       48 89 7d c0             mov    %rdi,-0x40(%rbp)
      ffffffff810433ae:       48 c7 45 a8 00 00 00    movq   $0x0,-0x58(%rbp)
      ffffffff810433b5:       00
      ffffffff810433b6:       bb 00 04 00 00          mov    $0x400,%ebx
      ffffffff810433bb:       48 01 f2                add    %rsi,%rdx
      ffffffff810433be:       48 89 55 b0             mov    %rdx,-0x50(%rbp)
      ...
      
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      4a6cc4bd
  7. 23 Oct, 2009 1 commit
    • Mike Galbraith's avatar
      sched: Strengthen buddies and mitigate buddy induced latencies · f685ceac
      Mike Galbraith authored
      
      
      This patch restores the effectiveness of LAST_BUDDY in preventing
      pgsql+oltp from collapsing due to wakeup preemption. It also
      switches LAST_BUDDY to exclusively do what it does best, namely
      mitigate the effects of aggressive wakeup preemption, which
      improves vmark throughput markedly, and restores mysql+oltp
      scalability.
      
      Since buddies are about scalability, enable them beginning at the
      point where we begin expanding sched_latency, namely
      sched_nr_latency. Previously, buddies were cleared aggressively,
      which seriously reduced their effectiveness. Not clearing
      aggressively however, produces a small drop in mysql+oltp
      throughput immediately after peak, indicating that LAST_BUDDY is
      actually doing some harm. This is right at the point where X on the
      desktop in competition with another load wants low latency service.
      Ergo, do not enable until we need to scale.
      
      To mitigate latency induced by buddies, or by a task just missing
      wakeup preemption, check latency at tick time.
      
      Last hunk prevents buddies from stymieing BALANCE_NEWIDLE via
      CACHE_HOT_BUDDY.
      
      Supporting performance tests:
      
       tip   = v2.6.32-rc5-1497-ga525b32
       tipx  = NO_GENTLE_FAIR_SLEEPERS NEXT_BUDDY granularity knobs = 31 knobs + 31 buddies
       tip+x = NO_GENTLE_FAIR_SLEEPERS granularity knobs = 31 knobs
      
      (Three run averages except where noted.)
      
       vmark:
       ------
       tip           108466 messages per second
       tip+          125307 messages per second
       tip+x         125335 messages per second
       tipx          117781 messages per second
       2.6.31.3      122729 messages per second
      
       mysql+oltp:
       -----------
       clients          1        2        4        8       16       32       64        128    256
       ..........................................................................................
       tip        9949.89 18690.20 34801.24 34460.04 32682.88 30765.97 28305.27 25059.64 19548.08
       tip+      10013.90 18526.84 34900.38 34420.14 33069.83 32083.40 30578.30 28010.71 25605.47
       tipx       9698.71 18002.70 34477.56 33420.01 32634.30 31657.27 29932.67 26827.52 21487.18
       2.6.31.3   8243.11 18784.20 34404.83 33148.38 31900.32 31161.90 29663.81 25995.94 18058.86
      
       pgsql+oltp:
       -----------
       clients          1        2        4        8       16       32       64      128      256
       ..........................................................................................
       tip       13686.37 26609.25 51934.28 51347.81 49479.51 45312.65 36691.91 26851.57 24145.35
       tip+ (1x) 13907.85 27135.87 52951.98 52514.04 51742.52 50705.43 49947.97 48374.19 46227.94
       tip+x     13906.78 27065.81 52951.19 52542.59 52176.11 51815.94 50838.90 49439.46 46891.00
       tipx      13742.46 26769.81 52351.99 51891.73 51320.79 50938.98 50248.65 48908.70 46553.84
       2.6.31.3  13815.35 26906.46 52683.34 52061.31 51937.10 51376.80 50474.28 49394.47 47003.25
      
      Signed-off-by: default avatarMike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f685ceac
  8. 12 Oct, 2009 1 commit
  9. 09 Oct, 2009 2 commits
  10. 05 Oct, 2009 1 commit
  11. 01 Oct, 2009 1 commit
  12. 24 Sep, 2009 2 commits
  13. 22 Sep, 2009 1 commit
    • Arjan van de Ven's avatar
      cpuidle: fix the menu governor to boost IO performance · 69d25870
      Arjan van de Ven authored
      
      
      Fix the menu idle governor which balances power savings, energy efficiency
      and performance impact.
      
      The reason for a reworked governor is that there have been serious
      performance issues reported with the existing code on Nehalem server
      systems.
      
      To show this I'm sure Andrew wants to see benchmark results:
      (benchmark is "fio", "no cstates" is using "idle=poll")
      
      		no cstates	current linux	new algorithm
      1 disk		107 Mb/s	85 Mb/s		105 Mb/s
      2 disks		215 Mb/s	123 Mb/s	209 Mb/s
      12 disks	590 Mb/s	320 Mb/s	585 Mb/s
      
      In various power benchmark measurements, no degredation was found by our
      measurement&diagnostics team.  Obviously a small percentage more power was
      used in the "fio" benchmark, due to the much higher performance.
      
      While it would be a novel idea to describe the new algorithm in this
      commit message, I cheaped out and described it in comments in the code
      instead.
      
      [changes since first post: spelling fixes from akpm, review feedback,
      folded menu-tng into menu.c]
      
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      69d25870
  14. 21 Sep, 2009 2 commits
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
    • Peter Williams's avatar
      sched: Simplify sys_sched_rr_get_interval() system call · 0d721cea
      Peter Williams authored
      
      
      By removing the need for it to know details of scheduling classes.
      
      This allows PlugSched to define orthogonal scheduling classes.
      
      Signed-off-by: default avatarPeter Williams <pwil3058@bigpond.net.au>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <06d1b89ee15a0eef82d7.1253496713@mudlark.pw.nest>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0d721cea
  15. 20 Sep, 2009 2 commits
  16. 17 Sep, 2009 2 commits
    • Peter Zijlstra's avatar
      sched: Add new wakeup preemption mode: WAKEUP_RUNNING · ad4b78bb
      Peter Zijlstra authored
      
      
      Create a new wakeup preemption mode, preempt towards tasks that run
      shorter on avg. It sets next buddy to be sure we actually run the task
      we preempted for.
      
      Test results:
      
       root@twins:~# while :; do :; done &
       [1] 6537
       root@twins:~# while :; do :; done &
       [2] 6538
       root@twins:~# while :; do :; done &
       [3] 6539
       root@twins:~# while :; do :; done &
       [4] 6540
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max          4750 usec
              Avg           497 usec
              Stdev         737 usec
      
       root@twins:/home/peter# echo WAKEUP_RUNNING > /debug/sched_features
      
       root@twins:/home/peter# ./latt -c4 sleep 4
       Entries: 48 (clients=4)
      
       Averages:
       ------------------------------
              Max            14 usec
              Avg             5 usec
              Stdev           3 usec
      
      Disabled by default - needs more testing.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      LKML-Reference: <new-submission>
      ad4b78bb
    • Ingo Molnar's avatar
      sched: Fix TASK_WAKING & loadaverage breakage · eb24073b
      Ingo Molnar authored
      
      
      Fix this:
      
      top - 21:54:00 up  2:59,  1 user,  load average: 432512.33, 426421.74, 417432.74
      
      Which happens because we now set TASK_WAKING before activate_task().
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      eb24073b
  17. 16 Sep, 2009 1 commit
  18. 15 Sep, 2009 12 commits
  19. 04 Sep, 2009 5 commits