1. 18 Nov, 2010 1 commit
    • Frederic Weisbecker's avatar
      tracing: New macro to set up initial event flags value · 1ed0c597
      Frederic Weisbecker authored
      
      
      This introduces the new TRACE_EVENT_FLAGS() macro in order
      to set up initial event flags value.
      
      This macro must simply follow the definition of a trace event
      and take the event name and the flag value as parameters:
      
      TRACE_EVENT(my_event, .....
      ....
      );
      
      TRACE_EVENT_FLAGS(my_event, 1)
      
      This will set up 1 as the initial my_event->flags value.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      1ed0c597
  2. 28 Oct, 2010 4 commits
    • Theodore Ts'o's avatar
      ext4,jbd2: convert tracepoints to use major/minor numbers · a269029d
      Theodore Ts'o authored
      
      
      Unfortunately perf can't deal with anything other than direct structure
      accesses in the TP_printk() section.  It will drop dead when it sees
      jbd2_dev_to_name() in the "print fmt" section of the tracepoint.
      
      Addresses-Google-Bug: 3138508
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      a269029d
    • Eric Sandeen's avatar
      ext4: don't use ext4_allocation_contexts for tracing · 3e1e5f50
      Eric Sandeen authored
      
      
      Many tracepoints were populating an ext4_allocation_context
      to pass in, but this requires a slab allocation even when
      tracepoints are off.  In fact, 4 of 5 of these allocations
      were only for tracing.  In addition, we were only using a
      small fraction of the 144 bytes of this structure for this
      purpose.
      
      We can do away with all these alloc/frees of the ac and
      simply pass in the bits we care about, instead.
      
      I tested this by turning on tracing and running through
      xfstests on x86_64.  I did not actually do anything with
      the trace output, however.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      3e1e5f50
    • Eric Sandeen's avatar
      ext4: fix oops in trace_ext4_mb_release_group_pa · 4d547616
      Eric Sandeen authored
      
      
      Our QA reported an oops in the ext4_mb_release_group_pa tracing,
      and Josef Bacik pointed out that it was because we may have a
      non-null but uninitialized ac_inode in the allocation context.
      
      I can reproduce it when running xfstests with ext4 tracepoints on, 
      on a CONFIG_SLAB_DEBUG kernel.
      
      We call trace_ext4_mb_release_group_pa from 2 places, 
      ext4_mb_discard_group_preallocations and 
      ext4_mb_discard_lg_preallocations
      
      In both cases we allocate an ac as a container just for tracing (!)
      and never fill in the ac_inode.  There's no reason to be assigning,
      testing, or printing it as far as I can see, so just remove it from
      the tracepoint.
      Signed-off-by: default avatarEric Sandeen <sandeen@redhat.com>
      Reviewed-by: default avatarJosef Bacik <josef@redhat.com>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      4d547616
    • Wen Congyang's avatar
      ext4: avoid null dereference in trace_ext4_mballoc_discard · b853fd36
      Wen Congyang authored
      
      
      ac->inode is set to null in function ext4_mb_release_group_pa(),
      and then trace_ext4_mballoc_discard(ac) is called, the kernel
      will panic.
      
      BUG: unable to handle kernel NULL pointer dereference at 000000a4
      IP: [<f87e1714>] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4]
      *pdpt = 0000000000abd001 *pde = 0000000000000000
      Oops: 0000 [#1] SMP
      
      Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530
      EIP: 0060:[<f87e1714>] EFLAGS: 00010206 CPU: 1
      EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4]
      EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000
      ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4
       DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000)
      Call Trace:
       [<f87f5ac1>] ? ext4_mb_release_group_pa+0x121/0x150 [ext4]
       [<f87f8356>] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4]
       [<f87fb7f1>] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4]
       [<c05a6c5b>] ? __make_request+0x10b/0x440
       [<f87f1fb4>] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4]
       [<c04ac78a>] ? rb_reserve_next_event+0xaa/0x3b0
       [<f87d18d6>] ? ext4_map_blocks+0xd6/0x1d0 [ext4]
       [<f87d2da7>] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4]
       [<c04c8a68>] ? find_get_pages_tag+0x38/0x110
       [<c04d23a5>] ? __pagevec_release+0x15/0x20
       [<f87d3ca5>] ? ext4_da_writepages+0x2b5/0x5d0 [ext4]
       [<c04cfbe0>] ? __writepage+0x0/0x30
       [<c04d0e34>] ? do_writepages+0x14/0x30
       [<c0526600>] ? writeback_single_inode+0xa0/0x240
       [<c0526971>] ? writeback_sb_inodes+0xc1/0x180
       [<c0526ab8>] ? writeback_inodes_wb+0x88/0x140
       [<c0526d7b>] ? wb_writeback+0x20b/0x320
       [<c045aca7>] ? lock_timer_base+0x27/0x50
       [<c0526fe0>] ? wb_do_writeback+0x150/0x190
       [<c05270a8>] ? bdi_writeback_thread+0x88/0x1f0
       [<c043b680>] ? complete+0x40/0x60
       [<c0527020>] ? bdi_writeback_thread+0x0/0x1f0
       [<c0469474>] ? kthread+0x74/0x80
       [<c0469400>] ? kthread+0x0/0x80
       [<c040a23e>] ? kernel_thread_helper+0x6/0x10
      Signed-off-by: default avatarWen Congyang <wency@cn.fujitsu.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatar"Theodore Ts'o" <tytso@mit.edu>
      b853fd36
  3. 26 Oct, 2010 5 commits
    • Mel Gorman's avatar
      writeback: do not sleep on the congestion queue if there are no congested BDIs... · 0e093d99
      Mel Gorman authored
      
      writeback: do not sleep on the congestion queue if there are no congested BDIs or if significant congestion is not being encountered in the current zone
      
      If congestion_wait() is called with no BDI congested, the caller will
      sleep for the full timeout and this may be an unnecessary sleep.  This
      patch adds a wait_iff_congested() that checks congestion and only sleeps
      if a BDI is congested else, it calls cond_resched() to ensure the caller
      is not hogging the CPU longer than its quota but otherwise will not sleep.
      
      This is aimed at reducing some of the major desktop stalls reported during
      IO.  For example, while kswapd is operating, it calls congestion_wait()
      but it could just have been reclaiming clean page cache pages with no
      congestion.  Without this patch, it would sleep for a full timeout but
      after this patch, it'll just call schedule() if it has been on the CPU too
      long.  Similar logic applies to direct reclaimers that are not making
      enough progress.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0e093d99
    • KOSAKI Motohiro's avatar
      vmscan: narrow the scenarios in whcih lumpy reclaim uses synchrounous reclaim · 7d3579e8
      KOSAKI Motohiro authored
      
      
      shrink_page_list() can decide to give up reclaiming a page under a
      number of conditions such as
      
        1. trylock_page() failure
        2. page is unevictable
        3. zone reclaim and page is mapped
        4. PageWriteback() is true
        5. page is swapbacked and swap is full
        6. add_to_swap() failure
        7. page is dirty and gfpmask don't have GFP_IO, GFP_FS
        8. page is pinned
        9. IO queue is congested
       10. pageout() start IO, but not finished
      
      With lumpy reclaim, failures result in entering synchronous lumpy reclaim
      but this can be unnecessary.  In cases (2), (3), (5), (6), (7) and (8),
      there is no point retrying.  This patch causes lumpy reclaim to abort when
      it is known it will fail.
      
      Case (9) is more interesting. current behavior is,
        1. start shrink_page_list(async)
        2. found queue_congested()
        3. skip pageout write
        4. still start shrink_page_list(sync)
        5. wait on a lot of pages
        6. again, found queue_congested()
        7. give up pageout write again
      
      So, it's useless time wasting.  However, just skipping page reclaim is
      also notgood as x86 allocating a huge page needs 512 pages for example.
      It can have more dirty pages than queue congestion threshold (~=128).
      
      After this patch, pageout() behaves as follows;
      
       - If order > PAGE_ALLOC_COSTLY_ORDER
      	Ignore queue congestion always.
       - If order <= PAGE_ALLOC_COSTLY_ORDER
      	skip write page and disable lumpy reclaim.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d3579e8
    • Mel Gorman's avatar
      writeback: account for time spent congestion_waited · 52bb9198
      Mel Gorman authored
      
      
      There is strong evidence to indicate a lot of time is being spent in
      congestion_wait(), some of it unnecessarily.  This patch adds a tracepoint
      for congestion_wait to record when congestion_wait() was called, how long
      the timeout was for and how long it actually slept.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Reviewed-by: default avatarMinchan Kim <minchan.kim@gmail.com>
      Reviewed-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      52bb9198
    • Mel Gorman's avatar
      tracing, vmscan: add trace events for LRU list shrinking · e11da5b4
      Mel Gorman authored
      There have been numerous reports of stalls that pointed at the problem
      being somewhere in the VM.  There are multiple roots to the problems which
      means dealing with any of the root problems in isolation is tricky to
      justify on their own and they would still need integration testing.  This
      patch series puts together two different patch sets which in combination
      should tackle some of the root causes of latency problems being reported.
      
      Patch 1 adds a tracepoint for shrink_inactive_list.  For this series, the
      most important results is being able to calculate the scanning/reclaim
      ratio as a measure of the amount of work being done by page reclaim.
      
      Patch 2 accounts for time spent in congestion_wait.
      
      Patches 3-6 were originally developed by Kosaki Motohiro but reworked for
      this series.  It has been noted that lumpy reclaim is far too aggressive
      and trashes the system somewhat.  As SLUB uses high-order allocations, a
      large cost incurred by lumpy reclaim will be noticeable.  It was also
      reported during transparent hugepage support testing that lumpy reclaim
      was trashing the system and these patches should mitigate that problem
      without disabling lumpy reclaim.
      
      Patch 7 adds wait_iff_congested() and replaces some callers of
      congestion_wait().  wait_iff_congested() only sleeps if there is a BDI
      that is currently congested.  Patch 8 notes that any BDI being congested
      is not necessarily a problem because there could be multiple BDIs of
      varying speeds and numberous zones.  It attempts to track when a zone
      being reclaimed contains many pages backed by a congested BDI and if so,
      reclaimers wait on the congestion queue.
      
      I ran a number of tests with monitoring on X86, X86-64 and PPC64. Each
      machine had 3G of RAM and the CPUs were
      
      X86:    Intel P4 2-core
      X86-64: AMD Phenom 4-core
      PPC64:  PPC970MP
      
      Each used a single disk and the onboard IO controller.  Dirty ratio was
      left at 20.  I'm just going to report for X86-64 and PPC64 in a vague
      attempt to keep this report short.  Four kernels were tested each based on
      v2.6.36-rc4
      
      traceonly-v2r2:     Patches 1 and 2 to instrument vmscan reclaims and congestion_wait
      lowlumpy-v2r3:      Patches 1-6 to test if lumpy reclaim is better
      waitcongest-v2r3:   Patches 1-7 to only wait on congestion
      waitwriteback-v2r4: Patches 1-8 to detect when a zone is congested
      
      nocongest-v1r5: Patches 1-3 for testing wait_iff_congestion
      nodirect-v1r5:  Patches 1-10 to disable filesystem writeback for better IO
      
      The tests run were as follows
      
      kernbench
      	compile-based benchmark. Smoke test performance
      
      sysbench
      	OLTP read-only benchmark. Will be re-run in the future as read-write
      
      micro-mapped-file-stream
      	This is a micro-benchmark from Johannes Weiner that accesses a
      	large sparse-file through mmap(). It was configured to run in only
      	single-CPU mode but can be indicative of how well page reclaim
      	identifies suitable pages.
      
      stress-highalloc
      	Tries to allocate huge pages under heavy load.
      
      kernbench, iozone and sysbench did not report any performance regression
      on any machine.  sysbench did pressure the system lightly and there was
      reclaim activity but there were no difference of major interest between
      the kernels.
      
      X86-64 micro-mapped-file-stream
      
                                            traceonly-v2r2           lowlumpy-v2r3        waitcongest-v2r3     waitwriteback-v2r4
      pgalloc_dma                       1639.00 (   0.00%)       667.00 (-145.73%)      1167.00 ( -40.45%)       578.00 (-183.56%)
      pgalloc_dma32                  2842410.00 (   0.00%)   2842626.00 (   0.01%)   2843043.00 (   0.02%)   2843014.00 (   0.02%)
      pgalloc_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pgsteal_dma                        729.00 (   0.00%)        85.00 (-757.65%)       609.00 ( -19.70%)       125.00 (-483.20%)
      pgsteal_dma32                  2338721.00 (   0.00%)   2447354.00 (   4.44%)   2429536.00 (   3.74%)   2436772.00 (   4.02%)
      pgsteal_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pgscan_kswapd_dma                 1469.00 (   0.00%)       532.00 (-176.13%)      1078.00 ( -36.27%)       220.00 (-567.73%)
      pgscan_kswapd_dma32            4597713.00 (   0.00%)   4503597.00 (  -2.09%)   4295673.00 (  -7.03%)   3891686.00 ( -18.14%)
      pgscan_kswapd_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pgscan_direct_dma                   71.00 (   0.00%)       134.00 (  47.01%)       243.00 (  70.78%)       352.00 (  79.83%)
      pgscan_direct_dma32             305820.00 (   0.00%)    280204.00 (  -9.14%)    600518.00 (  49.07%)    957485.00 (  68.06%)
      pgscan_direct_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pageoutrun                       16296.00 (   0.00%)     21254.00 (  23.33%)     18447.00 (  11.66%)     20067.00 (  18.79%)
      allocstall                         443.00 (   0.00%)       273.00 ( -62.27%)       513.00 (  13.65%)      1568.00 (  71.75%)
      
      These are based on the raw figures taken from /proc/vmstat.  It's a rough
      measure of reclaim activity.  Note that allocstall counts are higher
      because we are entering direct reclaim more often as a result of not
      sleeping in congestion.  In itself, it's not necessarily a bad thing.
      It's easier to get a view of what happened from the vmscan tracepoint
      report.
      
      FTrace Reclaim Statistics: vmscan
      
                                      traceonly-v2r2   lowlumpy-v2r3 waitcongest-v2r3 waitwriteback-v2r4
      Direct reclaims                                443        273        513       1568
      Direct reclaim pages scanned                305968     280402     600825     957933
      Direct reclaim pages reclaimed               43503      19005      30327     117191
      Direct reclaim write file async I/O              0          0          0          0
      Direct reclaim write anon async I/O              0          3          4         12
      Direct reclaim write file sync I/O               0          0          0          0
      Direct reclaim write anon sync I/O               0          0          0          0
      Wake kswapd requests                        187649     132338     191695     267701
      Kswapd wakeups                                   3          1          4          1
      Kswapd pages scanned                       4599269    4454162    4296815    3891906
      Kswapd pages reclaimed                     2295947    2428434    2399818    2319706
      Kswapd reclaim write file async I/O              1          0          1          1
      Kswapd reclaim write anon async I/O             59        187         41        222
      Kswapd reclaim write file sync I/O               0          0          0          0
      Kswapd reclaim write anon sync I/O               0          0          0          0
      Time stalled direct reclaim (seconds)         4.34       2.52       6.63       2.96
      Time kswapd awake (seconds)                  11.15      10.25      11.01      10.19
      
      Total pages scanned                        4905237   4734564   4897640   4849839
      Total pages reclaimed                      2339450   2447439   2430145   2436897
      %age total pages scanned/reclaimed          47.69%    51.69%    49.62%    50.25%
      %age total pages scanned/written             0.00%     0.00%     0.00%     0.00%
      %age  file pages scanned/written             0.00%     0.00%     0.00%     0.00%
      Percentage Time Spent Direct Reclaim        29.23%    19.02%    38.48%    20.25%
      Percentage Time kswapd Awake                78.58%    78.85%    76.83%    79.86%
      
      What is interesting here for nocongest in particular is that while direct
      reclaim scans more pages, the overall number of pages scanned remains the
      same and the ratio of pages scanned to pages reclaimed is more or less the
      same.  In other words, while we are sleeping less, reclaim is not doing
      more work and as direct reclaim and kswapd is awake for less time, it
      would appear to be doing less work.
      
      FTrace Reclaim Statistics: congestion_wait
      Direct number congest     waited                87        196         64          0
      Direct time   congest     waited            4604ms     4732ms     5420ms        0ms
      Direct full   congest     waited                72        145         53          0
      Direct number conditional waited                 0          0        324       1315
      Direct time   conditional waited               0ms        0ms        0ms        0ms
      Direct full   conditional waited                 0          0          0          0
      KSwapd number congest     waited                20         10         15          7
      KSwapd time   congest     waited            1264ms      536ms      884ms      284ms
      KSwapd full   congest     waited                10          4          6          2
      KSwapd number conditional waited                 0          0          0          0
      KSwapd time   conditional waited               0ms        0ms        0ms        0ms
      KSwapd full   conditional waited                 0          0          0          0
      
      The vanilla kernel spent 8 seconds asleep in direct reclaim and no time at
      all asleep with the patches.
      
      MMTests Statistics: duration
      User/Sys Time Running Test (seconds)         10.51     10.73      10.6     11.66
      Total Elapsed Time (seconds)                 14.19     13.00     14.33     12.76
      
      Overall, the tests completed faster. It is interesting to note that backing off further
      when a zone is congested and not just a BDI was more efficient overall.
      
      PPC64 micro-mapped-file-stream
      pgalloc_dma                    3024660.00 (   0.00%)   3027185.00 (   0.08%)   3025845.00 (   0.04%)   3026281.00 (   0.05%)
      pgalloc_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pgsteal_dma                    2508073.00 (   0.00%)   2565351.00 (   2.23%)   2463577.00 (  -1.81%)   2532263.00 (   0.96%)
      pgsteal_normal                       0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pgscan_kswapd_dma              4601307.00 (   0.00%)   4128076.00 ( -11.46%)   3912317.00 ( -17.61%)   3377165.00 ( -36.25%)
      pgscan_kswapd_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pgscan_direct_dma               629825.00 (   0.00%)    971622.00 (  35.18%)   1063938.00 (  40.80%)   1711935.00 (  63.21%)
      pgscan_direct_normal                 0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)         0.00 (   0.00%)
      pageoutrun                       27776.00 (   0.00%)     20458.00 ( -35.77%)     18763.00 ( -48.04%)     18157.00 ( -52.98%)
      allocstall                         977.00 (   0.00%)      2751.00 (  64.49%)      2098.00 (  53.43%)      5136.00 (  80.98%)
      
      Similar trends to x86-64. allocstalls are up but it's not necessarily bad.
      
      FTrace Reclaim Statistics: vmscan
      Direct reclaims                                977       2709       2098       5136
      Direct reclaim pages scanned                629825     963814    1063938    1711935
      Direct reclaim pages reclaimed               75550     242538     150904     387647
      Direct reclaim write file async I/O              0          0          0          2
      Direct reclaim write anon async I/O              0         10          0          4
      Direct reclaim write file sync I/O               0          0          0          0
      Direct reclaim write anon sync I/O               0          0          0          0
      Wake kswapd requests                        392119    1201712     571935     571921
      Kswapd wakeups                                   3          2          3          3
      Kswapd pages scanned                       4601307    4128076    3912317    3377165
      Kswapd pages reclaimed                     2432523    2318797    2312673    2144616
      Kswapd reclaim write file async I/O             20          1          1          1
      Kswapd reclaim write anon async I/O             57        132         11        121
      Kswapd reclaim write file sync I/O               0          0          0          0
      Kswapd reclaim write anon sync I/O               0          0          0          0
      Time stalled direct reclaim (seconds)         6.19       7.30      13.04      10.88
      Time kswapd awake (seconds)                  21.73      26.51      25.55      23.90
      
      Total pages scanned                        5231132   5091890   4976255   5089100
      Total pages reclaimed                      2508073   2561335   2463577   2532263
      %age total pages scanned/reclaimed          47.95%    50.30%    49.51%    49.76%
      %age total pages scanned/written             0.00%     0.00%     0.00%     0.00%
      %age  file pages scanned/written             0.00%     0.00%     0.00%     0.00%
      Percentage Time Spent Direct Reclaim        18.89%    20.65%    32.65%    27.65%
      Percentage Time kswapd Awake                72.39%    80.68%    78.21%    77.40%
      
      Again, a similar trend that the congestion_wait changes mean that direct
      reclaim scans more pages but the overall number of pages scanned while
      slightly reduced, are very similar.  The ratio of scanning/reclaimed
      remains roughly similar.  The downside is that kswapd and direct reclaim
      was awake longer and for a larger percentage of the overall workload.
      It's possible there were big differences in the amount of time spent
      reclaiming slab pages between the different kernels which is plausible
      considering that the micro tests runs after fsmark and sysbench.
      
      Trace Reclaim Statistics: congestion_wait
      Direct number congest     waited               845       1312        104          0
      Direct time   congest     waited           19416ms    26560ms     7544ms        0ms
      Direct full   congest     waited               745       1105         72          0
      Direct number conditional waited                 0          0       1322       2935
      Direct time   conditional waited               0ms        0ms       12ms      312ms
      Direct full   conditional waited                 0          0          0          3
      KSwapd number congest     waited                39        102         75         63
      KSwapd time   congest     waited            2484ms     6760ms     5756ms     3716ms
      KSwapd full   congest     waited                20         48         46         25
      KSwapd number conditional waited                 0          0          0          0
      KSwapd time   conditional waited               0ms        0ms        0ms        0ms
      KSwapd full   conditional waited                 0          0          0          0
      
      The vanilla kernel spent 20 seconds asleep in direct reclaim and only
      312ms asleep with the patches.  The time kswapd spent congest waited was
      also reduced by a large factor.
      
      MMTests Statistics: duration
      ser/Sys Time Running Test (seconds)         26.58     28.05      26.9     28.47
      Total Elapsed Time (seconds)                 30.02     32.86     32.67     30.88
      
      With all patches applies, the completion times are very similar.
      
      X86-64 STRESS-HIGHALLOC
                      traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4
      Pass 1          82.00 ( 0.00%)    84.00 ( 2.00%)    85.00 ( 3.00%)    85.00 ( 3.00%)
      Pass 2          90.00 ( 0.00%)    87.00 (-3.00%)    88.00 (-2.00%)    89.00 (-1.00%)
      At Rest         92.00 ( 0.00%)    90.00 (-2.00%)    90.00 (-2.00%)    91.00 (-1.00%)
      
      Success figures across the board are broadly similar.
      
                      traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4
      Direct reclaims                               1045        944        886        887
      Direct reclaim pages scanned                135091     119604     109382     101019
      Direct reclaim pages reclaimed               88599      47535      47863      46671
      Direct reclaim write file async I/O            494        283        465        280
      Direct reclaim write anon async I/O          29357      13710      16656      13462
      Direct reclaim write file sync I/O             154          2          2          3
      Direct reclaim write anon sync I/O           14594        571        509        561
      Wake kswapd requests                          7491        933        872        892
      Kswapd wakeups                                 814        778        731        780
      Kswapd pages scanned                       7290822   15341158   11916436   13703442
      Kswapd pages reclaimed                     3587336    3142496    3094392    3187151
      Kswapd reclaim write file async I/O          91975      32317      28022      29628
      Kswapd reclaim write anon async I/O        1992022     789307     829745     849769
      Kswapd reclaim write file sync I/O               0          0          0          0
      Kswapd reclaim write anon sync I/O               0          0          0          0
      Time stalled direct reclaim (seconds)      4588.93    2467.16    2495.41    2547.07
      Time kswapd awake (seconds)                2497.66    1020.16    1098.06    1176.82
      
      Total pages scanned                        7425913  15460762  12025818  13804461
      Total pages reclaimed                      3675935   3190031   3142255   3233822
      %age total pages scanned/reclaimed          49.50%    20.63%    26.13%    23.43%
      %age total pages scanned/written            28.66%     5.41%     7.28%     6.47%
      %age  file pages scanned/written             1.25%     0.21%     0.24%     0.22%
      Percentage Time Spent Direct Reclaim        57.33%    42.15%    42.41%    42.99%
      Percentage Time kswapd Awake                43.56%    27.87%    29.76%    31.25%
      
      Scanned/reclaimed ratios again look good with big improvements in
      efficiency.  The Scanned/written ratios also look much improved.  With a
      better scanned/written ration, there is an expectation that IO would be
      more efficient and indeed, the time spent in direct reclaim is much
      reduced by the full series and kswapd spends a little less time awake.
      
      Overall, indications here are that allocations were happening much faster
      and this can be seen with a graph of the latency figures as the
      allocations were taking place
      http://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-hydra-mean.ps
      
      FTrace Reclaim Statistics: congestion_wait
      Direct number congest     waited              1333        204        169          4
      Direct time   congest     waited           78896ms     8288ms     7260ms      200ms
      Direct full   congest     waited               756         92         69          2
      Direct number conditional waited                 0          0         26        186
      Direct time   conditional waited               0ms        0ms        0ms     2504ms
      Direct full   conditional waited                 0          0          0         25
      KSwapd number congest     waited                 4        395        227        282
      KSwapd time   congest     waited             384ms    25136ms    10508ms    18380ms
      KSwapd full   congest     waited                 3        232         98        176
      KSwapd number conditional waited                 0          0          0          0
      KSwapd time   conditional waited               0ms        0ms        0ms        0ms
      KSwapd full   conditional waited                 0          0          0          0
      KSwapd full   conditional waited               318          0        312          9
      
      Overall, the time spent speeping is reduced.  kswapd is still hitting
      congestion_wait() but that is because there are callers remaining where it
      wasn't clear in advance if they should be changed to wait_iff_congested()
      or not.  Overall the sleep imes are reduced though - from 79ish seconds to
      about 19.
      
      MMTests Statistics: duration
      User/Sys Time Running Test (seconds)       3415.43   3386.65   3388.39    3377.5
      Total Elapsed Time (seconds)               5733.48   3660.33   3689.41   3765.39
      
      With the full series, the time to complete the tests are reduced by 30%
      
      PPC64 STRESS-HIGHALLOC
                      traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4
      Pass 1          17.00 ( 0.00%)    34.00 (17.00%)    38.00 (21.00%)    43.00 (26.00%)
      Pass 2          25.00 ( 0.00%)    37.00 (12.00%)    42.00 (17.00%)    46.00 (21.00%)
      At Rest         49.00 ( 0.00%)    43.00 (-6.00%)    45.00 (-4.00%)    51.00 ( 2.00%)
      
      Success rates there are *way* up particularly considering that the 16MB
      huge pages on PPC64 mean that it's always much harder to allocate them.
      
      FTrace Reclaim Statistics: vmscan
                    stress-highalloc  stress-highalloc  stress-highalloc  stress-highalloc
                      traceonly-v2r2     lowlumpy-v2r3  waitcongest-v2r3waitwriteback-v2r4
      Direct reclaims                                499        505        564        509
      Direct reclaim pages scanned                223478      41898      51818      45605
      Direct reclaim pages reclaimed              137730      21148      27161      23455
      Direct reclaim write file async I/O            399        136        162        136
      Direct reclaim write anon async I/O          46977       2865       4686       3998
      Direct reclaim write file sync I/O              29          0          1          3
      Direct reclaim write anon sync I/O           31023        159        237        239
      Wake kswapd requests                           420        351        360        326
      Kswapd wakeups                                 185        294        249        277
      Kswapd pages scanned                      15703488   16392500   17821724   17598737
      Kswapd pages reclaimed                     5808466    2908858    3139386    3145435
      Kswapd reclaim write file async I/O         159938      18400      18717      13473
      Kswapd reclaim write anon async I/O        3467554     228957     322799     234278
      Kswapd reclaim write file sync I/O               0          0          0          0
      Kswapd reclaim write anon sync I/O               0          0          0          0
      Time stalled direct reclaim (seconds)      9665.35    1707.81    2374.32    1871.23
      Time kswapd awake (seconds)                9401.21    1367.86    1951.75    1328.88
      
      Total pages scanned                       15926966  16434398  17873542  17644342
      Total pages reclaimed                      5946196   2930006   3166547   3168890
      %age total pages scanned/reclaimed          37.33%    17.83%    17.72%    17.96%
      %age total pages scanned/written            23.27%     1.52%     1.94%     1.43%
      %age  file pages scanned/written             1.01%     0.11%     0.11%     0.08%
      Percentage Time Spent Direct Reclaim        44.55%    35.10%    41.42%    36.91%
      Percentage Time kswapd Awake                86.71%    43.58%    52.67%    41.14%
      
      While the scanning rates are slightly up, the scanned/reclaimed and
      scanned/written figures are much improved.  The time spent in direct
      reclaim and with kswapd are massively reduced, mostly by the lowlumpy
      patches.
      
      FTrace Reclaim Statistics: congestion_wait
      Direct number congest     waited               725        303        126          3
      Direct time   congest     waited           45524ms     9180ms     5936ms      300ms
      Direct full   congest     waited               487        190         52          3
      Direct number conditional waited                 0          0        200        301
      Direct time   conditional waited               0ms        0ms        0ms     1904ms
      Direct full   conditional waited                 0          0          0         19
      KSwapd number congest     waited                 0          2         23          4
      KSwapd time   congest     waited               0ms      200ms      420ms      404ms
      KSwapd full   congest     waited                 0          2          2          4
      KSwapd number conditional waited                 0          0          0          0
      KSwapd time   conditional waited               0ms        0ms        0ms        0ms
      KSwapd full   conditional waited                 0          0          0          0
      
      Not as dramatic a story here but the time spent asleep is reduced and we
      can still see what wait_iff_congested is going to sleep when necessary.
      
      MMTests Statistics: duration
      User/Sys Time Running Test (seconds)      12028.09   3157.17   3357.79   3199.16
      Total Elapsed Time (seconds)              10842.07   3138.72   3705.54   3229.85
      
      The time to complete this test goes way down.  With the full series, we
      are allocating over twice the number of huge pages in 30% of the time and
      there is a corresponding impact on the allocation latency graph available
      at.
      
      http://www.csn.ul.ie/~mel/postings/vmscanreduce-20101509/highalloc-interlatency-powyah-mean.ps
      
      
      
      This patch:
      
      Add a trace event for shrink_inactive_list() and updates the sample
      postprocessing script appropriately.  It can be used to determine how many
      pages were reclaimed and for non-lumpy reclaim where exactly the pages
      were reclaimed from.
      Signed-off-by: default avatarMel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e11da5b4
    • Wu Fengguang's avatar
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang authored
      This removes more dead code that was somehow missed by commit 0d99519e
      
      
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  4. 21 Oct, 2010 1 commit
    • Thomas Gleixner's avatar
      tracing: Cleanup the convoluted softirq tracepoints · f4bc6bb2
      Thomas Gleixner authored
      
      
      With the addition of trace_softirq_raise() the softirq tracepoint got
      even more convoluted. Why the tracepoints take two pointers to assign
      an integer is beyond my comprehension.
      
      But adding an extra case which treats the first pointer as an unsigned
      long when the second pointer is NULL including the back and forth
      type casting is just horrible.
      
      Convert the softirq tracepoints to take a single unsigned int argument
      for the softirq vector number and fix the call sites.
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <alpine.LFD.2.00.1010191428560.6815@localhost6.localdomain6>
      Acked-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Acked-by: mathieu.desnoyers@efficios.com
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      f4bc6bb2
  5. 05 Oct, 2010 2 commits
    • Tejun Heo's avatar
      workqueue: add queue_work and activate_work trace points · cdadf009
      Tejun Heo authored
      
      
      These two tracepoints allow tracking when and how a work is queued and
      activated.  This patch is based on Frederic's patch to add queue_work
      trace point.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      cdadf009
    • Tejun Heo's avatar
      workqueue: prepare for more tracepoints · 97bd2347
      Tejun Heo authored
      
      
      Define workqueue_work event class and use it for workqueue_execute_end
      trace point.  Also, move trace/events/workqueue.h include downwards
      such that all struct definitions are visible to it.  This is to
      prepare for more tracepoints and doesn't cause any functional change.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      97bd2347
  6. 21 Sep, 2010 1 commit
  7. 17 Sep, 2010 1 commit
    • Jean Pihet's avatar
      tracing, perf: Add more power related events · 74704ac6
      Jean Pihet authored
      
      
      This patch adds new generic events for dynamic power management
      tracing:
      
       - clock events class: used for clock enable/disable and for
         clock rate change,
       - power_domain events class: used for power domains transitions.
      
      The OMAP architecture will be using the new events for PM debugging,
      however the new events are made generic enough to be used by all
      platforms.
      Signed-off-by: default avatarJean Pihet <j-pihet@ti.com>
      Acked-by: default avatarThomas Renninger <trenn@suse.de>
      Cc: discuss@lesswatts.org
      Cc: linux-pm@lists.linux-foundation.org
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Kevin Hilman <khilman@deeprootsystems.com>
      LKML-Reference: <AANLkTinUmbSUUuxUzc8++pcb9gd1CZFdyTQFrveTBXyV@mail.gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      74704ac6
  8. 07 Sep, 2010 4 commits
    • Koki Sanagi's avatar
      skb: Add tracepoints to freeing skb · 07dc22e7
      Koki Sanagi authored
      
      
      This patch adds tracepoint to consume_skb and add trace_kfree_skb
      before __kfree_skb in skb_free_datagram_locked and net_tx_action.
      Combinating with tracepoint on dev_hard_start_xmit, we can check
      how long it takes to free transmitted packets. And using it, we can
      calculate how many packets driver had at that time. It is useful when
      a drop of transmitted packet is a problem.
      
                  sshd-6828  [000] 112689.258154: consume_skb: skbaddr=f2d99bb8
      Signed-off-by: default avatarKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
      Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <4C724364.50903@jp.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      07dc22e7
    • Koki Sanagi's avatar
      netdev: Add tracepoints to netdev layer · cf66ba58
      Koki Sanagi authored
      
      
      This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit,
      netif_rx and netif_receive_skb. These tracepoints help you to monitor
      network driver's input/output.
      
                <idle>-0     [001] 112447.902030: netif_rx: dev=eth1 skbaddr=f3ef0900 len=84
                <idle>-0     [001] 112447.902039: netif_receive_skb: dev=eth1 skbaddr=f3ef0900 len=84
                  sshd-6828  [000] 112447.903257: net_dev_queue: dev=eth4 skbaddr=f3fca538 len=226
                  sshd-6828  [000] 112447.903260: net_dev_xmit: dev=eth4 skbaddr=f3fca538 len=226 rc=0
      Signed-off-by: default avatarKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
      Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <4C72431E.3000901@jp.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      cf66ba58
    • Neil Horman's avatar
      napi: Convert trace_napi_poll to TRACE_EVENT · 3e4b10d7
      Neil Horman authored
      This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT
      to improve the usability of napi_poll tracepoint.
      
                <idle>-0     [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
                <idle>-0     [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
      
      The original patch is below:
      http://marc.info/?l=linux-kernel&m=126021713809450&w=2
      
      [ sanagi.koki@jp.fujitsu.com: And add a fix by Steven Rostedt:
      http://marc.info/?l=linux-kernel&m=126150506519173&w=2
      
       ]
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
      Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <4C7242D7.4050009@jp.fujitsu.com>
      Signed-off-by: default avatarKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      3e4b10d7
    • Lai Jiangshan's avatar
      irq: Add tracepoint to softirq_raise · 2bf2160d
      Lai Jiangshan authored
      
      
      Add a tracepoint for tracing when softirq action is raised.
      
      This and the existing tracepoints complete softirq's tracepoints:
      softirq_raise, softirq_entry and softirq_exit.
      
      And when this tracepoint is used in combination with
      the softirq_entry tracepoint we can determine
      the softirq raise latency.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com>
      Cc: Izumo Taku <izumi.taku@jp.fujitsu.com>
      Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Scott Mcmillan <scott.a.mcmillan@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <4C724298.4050509@jp.fujitsu.com>
      [ factorize softirq events with DECLARE_EVENT_CLASS ]
      Signed-off-by: default avatarKoki Sanagi <sanagi.koki@jp.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      2bf2160d
  9. 21 Aug, 2010 1 commit
    • Arjan van de Ven's avatar
      workqueue: Add basic tracepoints to track workqueue execution · e36c886a
      Arjan van de Ven authored
      
      
      With the introduction of the new unified work queue thread pools,
      we lost one feature: It's no longer possible to know which worker
      is causing the CPU to wake out of idle. The result is that PowerTOP
      now reports a lot of "kworker/a:b" instead of more readable results.
      
      This patch adds a pair of tracepoints to the new workqueue code,
      similar in style to the timer/hrtimer tracepoints.
      
      With this pair of tracepoints, the next PowerTOP can correctly
      report which work item caused the wakeup (and how long it took):
      
      Interrupt (43)            i915      time   3.51ms    wakeups 141
      Work      ieee80211_iface_work      time   0.81ms    wakeups  29
      Work              do_dbs_timer      time   0.55ms    wakeups  24
      Process                   Xorg      time  21.36ms    wakeups   4
      Timer    sched_rt_period_timer      time   0.01ms    wakeups   1
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e36c886a
  10. 19 Aug, 2010 1 commit
    • Arjan van de Ven's avatar
      tracing: Fix timer tracing · ede1b429
      Arjan van de Ven authored
      
      
      PowerTOP would like to be able to trace timers.
      
      Unfortunately, the current timer tracing is not very useful: the
      actual timer function is not recorded in the trace at the start
      of timer execution.
      
      Although this is recorded for timer "start" time (when it gets
      armed), this is not useful; most timers get started early, and a
      tracer like PowerTOP will never see this event, but will only
      see the actual running of the  timer.
      
      This patch just adds the function to the timer tracing; I've
      verified with PowerTOP that now it can get useful information
      about timers.
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: xiaoguangrong@cn.fujitsu.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org> # .35.x, .34.x, .33.x
      LKML-Reference: <4C6C5FA9.3000405@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ede1b429
  11. 10 Aug, 2010 7 commits
  12. 07 Aug, 2010 6 commits
  13. 03 Aug, 2010 1 commit
    • Thomas Renninger's avatar
      [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent · 6f4f2723
      Thomas Renninger authored
      
      
      and fix the broken case if a core's frequency depends on others.
      
      trace_power_frequency was only implemented in a rather ungeneric way
      in acpi-cpufreq driver's target() function only.
      -> Move the call to trace_power_frequency to
         cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
         notifier is triggered.
         This will support power frequency tracing by all cpufreq drivers
      
      trace_power_frequency did not trace frequency changes correctly when
      the userspace governor was used or when CPU cores' frequency depend
      on each other.
      -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
         which gets switched automatically fixes this.
      
      Robert Schoene provided some important fixes on top of my initial
      quick shot version which are integrated in this patch:
      - Forgot some changes in power_end trace (TP_printk/variable names)
      - Variable dummy in power_end must now be cpu_id
      - Use static 64 bit variable instead of unsigned int for cpu_id
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      CC: davej@redhat.com
      CC: arjan@infradead.org
      CC: linux-kernel@vger.kernel.org
      CC: robert.schoene@tu-dresden.de
      Tested-by: robert.schoene@tu-dresden.de
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      6f4f2723
  14. 01 Aug, 2010 1 commit
  15. 27 Jul, 2010 1 commit
  16. 22 Jul, 2010 1 commit
    • Thomas Renninger's avatar
      x86 cpufreq, perf: Make trace_power_frequency cpufreq driver independent · 4c21adf2
      Thomas Renninger authored
      
      
      and fix the broken case if a core's frequency depends on others.
      
      trace_power_frequency was only implemented in a rather ungeneric
      way in acpi-cpufreq driver's target() function only.
      
      -> Move the call to trace_power_frequency to
         cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
         notifier is triggered.
         This will support power frequency tracing by all cpufreq
         drivers.
      
      trace_power_frequency did not trace frequency changes correctly
      when the userspace governor was used or when CPU cores'
      frequency depend on each other.
      
      -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
         which gets switched automatically fixes this.
      
      Robert Schoene provided some important fixes on top of my
      initial quick shot version which are integrated in this patch:
      - Forgot some changes in power_end trace (TP_printk/variable names)
      - Variable dummy in power_end must now be cpu_id
      - Use static 64 bit variable instead of unsigned int for cpu_id
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarThomas Renninger <trenn@suse.de>
      Cc: davej@codemonkey.org.uk
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Acked-by: default avatarArjan van de Ven <arjan@infradead.org>
      Cc: Robert Schoene <robert.schoene@tu-dresden.de>
      Tested-by: default avatarRobert Schoene <robert.schoene@tu-dresden.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4c21adf2
  17. 21 Jul, 2010 1 commit
    • Lai Jiangshan's avatar
      tracing: Reduce latency and remove percpu trace_seq · bc289ae9
      Lai Jiangshan authored
      
      
      __print_flags() and __print_symbolic() use percpu trace_seq:
      
      1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
      2) It is percpu data and it wastes more memory for multi-cpus system.
      3) It disables preemption when it executes its core routine
         "trace_seq_printf(s, "%s: ", #call);" and introduces latency.
      
      So we move this trace_seq to struct trace_iterator.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4C078350.7090106@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      bc289ae9
  18. 29 Jun, 2010 1 commit