1. 17 Nov, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Add irq, preempt-count and need resched info to default trace output · 77271ce4
      Steven Rostedt authored
      
      
      People keep asking how to get the preempt count, irq, and need resched info
      and we keep telling them to enable the latency format. Some developers think
      that traces without this info is completely useless, and for a lot of tasks
      it is useless.
      
      The first option was to enable the latency trace as the default format, but
      the header for the latency format is pretty useless for most tracers and
      it also does the timestamp in straight microseconds from the time the trace
      started. This is sometimes more difficult to read as the default trace is
      seconds from the start of boot up.
      
      Latency format:
      
       # tracer: nop
       #
       # nop latency trace v1.1.5 on 3.2.0-rc1-test+
       # --------------------------------------------------------------------
       # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
       #    -----------------
       #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
       #    -----------------
       #
       #                  _------=> CPU#
       #                 / _-----=> irqs-off
       #                | / _----=> need-resched
       #                || / _---=> hardirq/softirq
       #                ||| / _--=> preempt-depth
       #                |||| /     delay
       #  cmd     pid   ||||| time  |   caller
       #     \   /      |||||  \    |   /
       migratio-6       0...2 41778231us+: rcu_note_context_switch <-__schedule
       migratio-6       0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch
       migratio-6       0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch
       migratio-6       0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch
       migratio-6       0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch
       migratio-6       0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule
      
      default format:
      
       # tracer: nop
       #
       #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
       #              | |       |          |         |
            migration/0-6     [000]    50.025810: rcu_note_context_switch <-__schedule
            migration/0-6     [000]    50.025812: trace_rcu_utilization <-rcu_note_context_switch
            migration/0-6     [000]    50.025813: rcu_sched_qs <-rcu_note_context_switch
            migration/0-6     [000]    50.025815: rcu_preempt_qs <-rcu_note_context_switch
            migration/0-6     [000]    50.025817: trace_rcu_utilization <-rcu_note_context_switch
            migration/0-6     [000]    50.025818: debug_lockdep_rcu_enabled <-__schedule
            migration/0-6     [000]    50.025820: debug_lockdep_rcu_enabled <-__schedule
      
      The latency format header has latency information that is pretty meaningless
      for most tracers. Although some of the header is useful, and we can add that
      later to the default format as well.
      
      What is really useful with the latency format is the irqs-off, need-resched
      hard/softirq context and the preempt count.
      
      This commit adds the option irq-info which is on by default that adds this
      information:
      
       # tracer: nop
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
                 <idle>-0     [000] d..2    49.309305: cpuidle_get_driver <-cpuidle_idle_call
                 <idle>-0     [000] d..2    49.309307: mwait_idle <-cpu_idle
                 <idle>-0     [000] d..2    49.309309: need_resched <-mwait_idle
                 <idle>-0     [000] d..2    49.309310: test_ti_thread_flag <-need_resched
                 <idle>-0     [000] d..2    49.309312: trace_power_start.constprop.13 <-mwait_idle
                 <idle>-0     [000] d..2    49.309313: trace_cpu_idle <-mwait_idle
                 <idle>-0     [000] d..2    49.309315: need_resched <-mwait_idle
      
      If a user wants the old format, they can disable the 'irq-info' option:
      
       # tracer: nop
       #
       #           TASK-PID   CPU#      TIMESTAMP  FUNCTION
       #              | |       |          |         |
                 <idle>-0     [000]     49.309305: cpuidle_get_driver <-cpuidle_idle_call
                 <idle>-0     [000]     49.309307: mwait_idle <-cpu_idle
                 <idle>-0     [000]     49.309309: need_resched <-mwait_idle
                 <idle>-0     [000]     49.309310: test_ti_thread_flag <-need_resched
                 <idle>-0     [000]     49.309312: trace_power_start.constprop.13 <-mwait_idle
                 <idle>-0     [000]     49.309313: trace_cpu_idle <-mwait_idle
                 <idle>-0     [000]     49.309315: need_resched <-mwait_idle
      Requested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      77271ce4
  2. 07 Nov, 2011 1 commit
    • Jiri Olsa's avatar
      tracing/latency: Fix header output for latency tracers · 7e9a49ef
      Jiri Olsa authored
      In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the
      function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers
      do not display proper latency header.
      
      The involved/fixed latency tracers are:
              wakeup_rt
              wakeup
              preemptirqsoff
              preemptoff
              irqsoff
      
      The patch adds proper handling of tracer configuration options for latency
      tracers, and displaying correct header info accordingly.
      
      * The current output (for wakeup tracer) with both graph and function
        tracers disabled is:
      
        # tracer: wakeup
        #
          <idle>-0       0d.h5    1us+:      0:120:R   + [000]     7:  0:R watchdog/0
          <idle>-0       0d.h5    3us+: ttwu_do_activate.clone.1 <-try_to_wake_up
          ...
      
      * The fixed output is:
      
        # tracer: wakeup
        #
        # wakeup latency trace v1.1.5 on 3.1.0-tip+
        # --------------------------------------------------------------------
        # latency: 55 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
        #    -----------------
        #    | task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99)
        #    -----------------
        #
        #                  _------=> CPU#
        #                 / _-----=> irqs-off
        #                | / _----=> need-resched
        #                || / _---=> hardirq/softirq
        #                ||| / _--=> preempt-depth
        #                |||| /     delay
        #  cmd     pid   ||||| time  |   caller
        #     \   /      |||||  \    |   /
             cat-1129    0d..4    1us :   1129:120:R   + [000]     6:  0:R migration/0
             cat-1129    0d..4    2us+: ttwu_do_activate.clone.1 <-try_to_wake_up
      
      * The current output (for wakeup tracer) with only function
        tracer enabled is:
      
        # tracer: wakeup
        #
             cat-1140    0d..4    1us+:   1140:120:R   + [000]     6:  0:R migration/0
             cat-1140    0d..4    2us : ttwu_do_activate.clone.1 <-try_to_wake_up
      
      * The fixed output is:
        # tracer: wakeup
        #
        # wakeup latency trace v1.1.5 on 3.1.0-tip+
        # --------------------------------------------------------------------
        # latency: 207 us, #109/109, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
        #    -----------------
        #    | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99)
        #    -----------------
        #
        #                  _------=> CPU#
        #                 / _-----=> irqs-off
        #                | / _----=> need-resched
        #                || / _---=> hardirq/softirq
        #                ||| / _--=> preempt-depth
        #                |||| /     delay
        #  cmd     pid   ||||| time  |   caller
        #     \   /      |||||  \    |   /
          <idle>-0       1d.h5    1us+:      0:120:R   + [001]    12:  0:R watchdog/1
          <idle>-0       1d.h5    3us : ttwu_do_activate.clone.1 <-try_to_wake_up
      
      Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com
      
      
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7e9a49ef
  3. 14 Oct, 2011 2 commits
  4. 11 Oct, 2011 2 commits
    • Steven Rostedt's avatar
      tracing: Do not allocate buffer for trace_marker · d696b58c
      Steven Rostedt authored
      
      
      When doing intense tracing, the kmalloc inside trace_marker can
      introduce side effects to what is being traced.
      
      As trace_marker() is used by userspace to inject data into the
      kernel ring buffer, it needs to do so with the least amount
      of intrusion to the operations of the kernel or the user space
      application.
      
      As the ring buffer is designed to write directly into the buffer
      without the need to make a temporary buffer, and userspace already
      went through the hassle of knowing how big the write will be,
      we can simply pin the userspace pages and write the data directly
      into the buffer. This improves the impact of tracing via trace_marker
      tremendously!
      
      Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the
      use of get_user_pages_fast() and kmap_atomic().
      Suggested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Suggested-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d696b58c
    • Steven Rostedt's avatar
      tracing: Warn on output if the function tracer was found corrupted · e0a413f6
      Steven Rostedt authored
      
      
      As the function tracer is very intrusive, lots of self checks are
      performed on the tracer and if something is found to be strange
      it will shut itself down keeping it from corrupting the rest of the
      kernel. This shutdown may still allow functions to be traced, as the
      tracing only stops new modifications from happening. Trying to stop
      the function tracer itself can cause more harm as it requires code
      modification.
      
      Although a WARN_ON() is executed, a user may not notice it. To help
      the user see that something isn't right with the tracing of the system
      a big warning is added to the output of the tracer that lets the user
      know that their data may be incomplete.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e0a413f6
  5. 19 Sep, 2011 1 commit
  6. 13 Sep, 2011 1 commit
  7. 30 Aug, 2011 2 commits
  8. 14 Jul, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Have dynamic size event stack traces · 4a9bd3f1
      Steven Rostedt authored
      
      
      Currently the stack trace per event in ftace is only 8 frames.
      This can be quite limiting and sometimes useless. Especially when
      the "ignore frames" is wrong and we also use up stack frames for
      the event processing itself.
      
      Change this to be dynamic by adding a percpu buffer that we can
      write a large stack frame into and then copy into the ring buffer.
      
      For interrupts and NMIs that come in while another event is being
      process, will only get to use the 8 frame stack. That should be enough
      as the task that it interrupted will have the full stack frame anyway.
      Requested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4a9bd3f1
  9. 15 Jun, 2011 7 commits
    • Masami Hiramatsu's avatar
      tracing/kprobes: Fix kprobe-tracer to support stack trace · 1fd8df2c
      Masami Hiramatsu authored
      
      
      Fix to support kernel stack trace correctly on kprobe-tracer.
      Since the execution path of kprobe-based dynamic events is different
      from other tracepoint-based events, normal ftrace_trace_stack() doesn't
      work correctly. To fix that, this introduces ftrace_trace_stack_regs()
      which traces stack via pt_regs instead of current stack register.
      
      e.g.
      
       # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
       # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
       # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
       # head -n 20 /sys/kernel/debug/tracing/trace
                  bash-2968  [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
                  bash-2968  [000] 10297.050247: <stack trace>
       => schedule_timeout
       => n_tty_read
       => tty_read
       => vfs_read
       => sys_read
       => system_call_fastpath
           kworker/0:1-2940  [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
           kworker/0:1-2940  [000] 10297.050266: <stack trace>
       => worker_thread
       => kthread
       => kernel_thread_helper
                  sshd-1132  [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
                  sshd-1132  [000] 10297.050365: <stack trace>
       => sysret_careful
      
      Note: Even with this fix, the first entry will be skipped
      if the probe is put on the function entry area before
      the frame pointer is set up (usually, that is 4 bytes
       (push %bp; mov %sp %bp) on x86), because stack unwinder
      depends on the frame pointer.
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1fd8df2c
    • Peter Huewe's avatar
      tracing: Convert to kstrtoul_from_user · 22fe9b54
      Peter Huewe authored
      
      
      This patch replaces the code for getting an unsigned long from a
      userspace buffer by a simple call to kstroul_from_user.
      This makes it easier to read and less error prone.
      Signed-off-by: default avatarPeter Huewe <peterhuewe@gmx.de>
      Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      22fe9b54
    • Jiri Olsa's avatar
      tracing, function: Fix trace header to follow context-info option · f56e7f8e
      Jiri Olsa authored
      
      
      The header display of function tracer does not follow
      the context-info option, so field names are displayed even
      if this option is off.
      
      Added check for TRACE_ITER_CONTEXT_INFO trace_flags.
      
      With following commands:
      	# echo function > ./current_tracer
      	# echo 0 > options/context-info
      	# cat trace
      
      This is what it looked like before:
      # tracer: function
      #
      #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
      #              | |       |          |         |
      add_preempt_count <-schedule
      rcu_note_context_switch <-schedule
      ...
      
      This is what it looks like now:
      # tracer: function
      #
      _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
      ...
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.com
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f56e7f8e
    • Steven Rostedt's avatar
      tracing: Add disable_on_free option · cf30cf67
      Steven Rostedt authored
      
      
      Add a trace option to disable tracing on free. When this option is
      set, a write into the free_buffer file will not only shrink the
      ring buffer down to zero, but it will also disable tracing.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      cf30cf67
    • Vaibhav Nagarnaik's avatar
      tracing: Add a proc file to stop tracing and free buffer · 4f271a2a
      Vaibhav Nagarnaik authored
      
      
      The proc file entry buffer_size_kb is used to set the size of tracing
      buffer. The memory to expand the buffer size is kernel memory. Consider
      a use case where tracing is handled by a user space utility, which acts
      as a gate keeper for tracing requests. In an OOM condition, tracing is
      considered a low priority task and if the utility gets killed the ring
      buffer memory cannot be released back to the kernel.
      
      This patch adds a proc file called "free_buffer" whose purpose is to
      stop tracing and free up the ring buffer when it is closed.
      
      The user space process can then set the desired size in buffer_size_kb
      file and open the fd to the "free_buffer" file. Under OOM condition, if
      the process gets killed, the kernel closes the file descriptor. The
      release handler stops the tracing and releases the kernel memory
      automatically.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: default avatarVaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.com
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4f271a2a
    • Vaibhav Nagarnaik's avatar
      tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
      Vaibhav Nagarnaik authored
      
      
      The tracing ring buffer is a group of per-cpu ring buffers where
      allocation and logging is done on a per-cpu basis. The events that are
      generated on a particular CPU are logged in the corresponding buffer.
      This is to provide wait-free writes between CPUs and good NUMA node
      locality while accessing the ring buffer.
      
      However, the allocation routines consider NUMA locality only for buffer
      page metadata and not for the actual buffer page. This causes the pages
      to be allocated on the NUMA node local to the CPU where the allocation
      routine is running at the time.
      
      This patch fixes the problem by using a NUMA node specific allocation
      routine so that the pages are allocated from a NUMA node local to the
      logging CPU.
      
      I tested with the getuid_microbench from autotest. It is a simple binary
      that calls getuid() in a loop and measures the average time for the
      syscall to complete. The following command was used to test:
      $ getuid_microbench 1000000
      
      Compared the numbers found on kernel with and without this patch and
      found that logging latency decreases by 30-50 ns/call.
      tracing with non-NUMA allocation - 569 ns/call
      tracing with NUMA allocation     - 512 ns/call
      Signed-off-by: default avatarVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7ea59064
    • Vaibhav Nagarnaik's avatar
      tracing: Schedule a delayed work to call wakeup() · e7e2ee89
      Vaibhav Nagarnaik authored
      
      
      In using syscall tracing by concurrent processes, the wakeup() that is
      called in the event commit function causes contention on the spin lock
      of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
      tracepoints, and by running getuid_microbench from autotest in parallel
      I found that the contention causes exponential latency increase in the
      tracing path.
      
      The autotest binary getuid_microbench calls getuid() in a tight loop for
      the given number of iterations and measures the average time required to
      complete a single invocation of syscall.
      
      The patch schedules a delayed work after 2 ms once an event commit calls
      to wake up the trace wait_queue. This removes the delay caused by
      contention on spin lock in wakeup() and amortizes the wakeup() calls
      scheduled over the 2 ms period.
      
      In the following example, the script enables the sys_enter_getuid and
      sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
      with the given number of processes. The output clearly shows the latency
      increase caused by contentions.
      
      $ ~/getuid.sh 1
      1000000 calls in 0.720974253 s (720.974253 ns/call)
      
      $ ~/getuid.sh 2
      1000000 calls in 1.166457554 s (1166.457554 ns/call)
      1000000 calls in 1.168933765 s (1168.933765 ns/call)
      
      $ ~/getuid.sh 3
      1000000 calls in 1.783827516 s (1783.827516 ns/call)
      1000000 calls in 1.795553270 s (1795.553270 ns/call)
      1000000 calls in 1.796493376 s (1796.493376 ns/call)
      
      $ ~/getuid.sh 4
      1000000 calls in 4.483041796 s (4483.041796 ns/call)
      1000000 calls in 4.484165388 s (4484.165388 ns/call)
      1000000 calls in 4.484850762 s (4484.850762 ns/call)
      1000000 calls in 4.485643576 s (4485.643576 ns/call)
      
      $ ~/getuid.sh 5
      1000000 calls in 6.497521653 s (6497.521653 ns/call)
      1000000 calls in 6.502000236 s (6502.000236 ns/call)
      1000000 calls in 6.501709115 s (6501.709115 ns/call)
      1000000 calls in 6.502124100 s (6502.124100 ns/call)
      1000000 calls in 6.502936358 s (6502.936358 ns/call)
      
      After the patch, the latencies scale better.
      1000000 calls in 0.728720455 s (728.720455 ns/call)
      
      1000000 calls in 0.842782857 s (842.782857 ns/call)
      1000000 calls in 0.883803135 s (883.803135 ns/call)
      
      1000000 calls in 0.902077764 s (902.077764 ns/call)
      1000000 calls in 0.902838202 s (902.838202 ns/call)
      1000000 calls in 0.908896885 s (908.896885 ns/call)
      
      1000000 calls in 0.932523515 s (932.523515 ns/call)
      1000000 calls in 0.958009672 s (958.009672 ns/call)
      1000000 calls in 0.986188020 s (986.188020 ns/call)
      1000000 calls in 0.989771102 s (989.771102 ns/call)
      
      1000000 calls in 0.933518391 s (933.518391 ns/call)
      1000000 calls in 0.958897947 s (958.897947 ns/call)
      1000000 calls in 1.031038897 s (1031.038897 ns/call)
      1000000 calls in 1.089516025 s (1089.516025 ns/call)
      1000000 calls in 1.141998347 s (1141.998347 ns/call)
      Signed-off-by: default avatarVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e7e2ee89
  10. 06 May, 2011 1 commit
  11. 04 Apr, 2011 1 commit
    • Jiri Olsa's avatar
      tracing: Avoid soft lockup in trace_pipe · ee5e51f5
      Jiri Olsa authored
      
      
      running following commands:
      
        # enable the binary option
        echo 1 > ./options/bin
        # disable context info option
        echo 0 > ./options/context-info
        # tracing only events
        echo 1 > ./events/enable
        cat trace_pipe
      
      plus forcing system to generate many tracing events,
      is causing lockup (in NON preemptive kernels) inside
      tracing_read_pipe function.
      
      The issue is also easily reproduced by running ltp stress test.
      (ftrace_stress_test.sh)
      
      The reasons are:
       - bin/hex/raw output functions for events are set to
         trace_nop_print function, which prints nothing and
         returns TRACE_TYPE_HANDLED value
       - LOST EVENT trace do not handle trace_seq overflow
      
      These reasons force the while loop in tracing_read_pipe
      function never to break.
      
      The attached patch fixies handling of lost event trace, and
      changes trace_nop_print to print minimal info, which is needed
      for the correct tracing_read_pipe processing.
      
      v2 changes:
       - omit the cond_resched changes by trace_nop_print changes
       - WARN changed to WARN_ONCE and added info to be able
         to find out the culprit
      
      v3 changes:
       - make more accurate patch comment
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <20110325110518.GC1922@jolsa.brq.redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ee5e51f5
  12. 31 Mar, 2011 1 commit
  13. 10 Mar, 2011 2 commits
    • Steven Rostedt's avatar
      tracing: Fix irqoff selftest expanding max buffer · 4a0b1665
      Steven Rostedt authored
      
      
      If the kernel command line declares a tracer "ftrace=sometracer" and
      that tracer is either not defined or is enabled after irqsoff,
      then the irqs off selftest will fail with the following error:
      
      Testing tracer irqsoff:
      ------------[ cut here ]------------
      WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
      ce.c:713 update_max_tr_single+0xfa/0x11b()
      Hardware name:
      Modules linked in:
      Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
      Call Trace:
       [<c0441d9d>] ? warn_slowpath_common+0x65/0x7a
       [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
       [<c0441dc1>] ? warn_slowpath_null+0xf/0x13
       [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
       [<c049e454>] ? stop_critical_timing+0x154/0x204
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049e529>] ? time_hardirqs_on+0x25/0x28
       [<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f
       [<c0468cec>] ? trace_hardirqs_on+0xb/0xd
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b6b8>] ? register_tracer+0xf8/0x1a3
       [<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11
       [<c040115e>] ? do_one_initcall+0x71/0x121
       [<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11
       [<c14ce3a9>] ? kernel_init+0x13a/0x1b6
       [<c14ce26f>] ? kernel_init+0x0/0x1b6
       [<c0403842>] ? kernel_thread_helper+0x6/0x10
      ---[ end trace e93713a9d40cd06c ]---
      .. no entries found ..FAILED!
      
      What happens is the "ftrace=..." will expand the ring buffer to its
      default size (from its minimum size) but it will not expand the
      max ring buffer (the ring buffer to store maximum latencies).
      When the irqsoff test runs, it will call the ring buffer swap routine
      that checks if the max ring buffer is the same size as the normal
      ring buffer, and will fail if it is not. This causes the test to fail.
      
      The solution is to expand the max ring buffer before running the self
      test if the max ring buffer is used by that tracer and the normal ring
      buffer is expanded. The max ring buffer should be shrunk again after
      the test is done to save space.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4a0b1665
    • Steven Rostedt's avatar
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt authored
      
      
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
  14. 09 Mar, 2011 1 commit
    • David Sharp's avatar
      tracing: Add an 'overwrite' trace_option. · 750912fa
      David Sharp authored
      
      
      Add an "overwrite" trace_option for ftrace to control whether the buffer should
      be overwritten on overflow or not. The default remains to overwrite old events
      when the buffer is full. This patch adds the option to instead discard newest
      events when the buffer is full. This is useful to get a snapshot of traces just
      after enabling traces. Dropping the current event is also a simpler code path.
      Signed-off-by: default avatarDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      750912fa
  15. 08 Feb, 2011 1 commit
  16. 08 Jan, 2011 1 commit
  17. 30 Nov, 2010 1 commit
  18. 17 Nov, 2010 1 commit
  19. 13 Nov, 2010 1 commit
    • Steven Rostedt's avatar
      tracing: Fix recursive user stack trace · 91e86e56
      Steven Rostedt authored
      
      
      The user stack trace can fault when examining the trace. Which
      would call the do_page_fault handler, which would trace again,
      which would do the user stack trace, which would fault and call
      do_page_fault again ...
      
      Thus this is causing a recursive bug. We need to have a recursion
      detector here.
      
      [ Resubmitted by Jiri Olsa ]
      
      [ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <1289390172-9730-3-git-send-email-jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      91e86e56
  20. 21 Oct, 2010 1 commit
  21. 18 Oct, 2010 1 commit
  22. 13 Aug, 2010 1 commit
    • Marcin Slusarz's avatar
      tracing: Sanitize value returned from write(trace_marker, "...", len) · 1aa54bca
      Marcin Slusarz authored
      
      
      When userspace code writes non-new-line-terminated string to trace_marker
      file, write handler appends new-line and returns number of bytes written
      to trace buffer, so
      write(fd, "abc", 3) will return 4
      
      That's unexpected and unfortunately it confuses glibc's fprintf function.
      
      Example:
      int main() {
        fprintf(stderr, "abc");
        return 0;
      }
      
      $ gcc test.c -o test
      $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
      $ ./test 2>/sys/kernel/debug/tracing/trace_marker
      
      results in infinite loop:
      write(fd, "abc", 3) = 4
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      (...)
      
      ...and kernel trace buffer full of empty markers.
      
      Fix it by sanitizing write return value.
      Signed-off-by: default avatarMarcin Slusarz <marcin.slusarz@gmail.com>
      LKML-Reference: <20100727231801.GB2826@joi.lan>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1aa54bca
  23. 05 Aug, 2010 1 commit
  24. 22 Jul, 2010 1 commit
  25. 21 Jul, 2010 2 commits
    • KOSAKI Motohiro's avatar
      tracing: Shrink max latency ringbuffer if unnecessary · ef710e10
      KOSAKI Motohiro authored
      
      
      Documentation/trace/ftrace.txt says
      
        buffer_size_kb:
      
              This sets or displays the number of kilobytes each CPU
              buffer can hold. The tracer buffers are the same size
              for each CPU. The displayed number is the size of the
              CPU buffer and not total size of all buffers. The
              trace buffers are allocated in pages (blocks of memory
              that the kernel uses for allocation, usually 4 KB in size).
              If the last page allocated has room for more bytes
              than requested, the rest of the page will be used,
              making the actual allocation bigger than requested.
              ( Note, the size may not be a multiple of the page size
                due to buffer management overhead. )
      
              This can only be updated when the current_tracer
              is set to "nop".
      
      But it's incorrect. currently total memory consumption is
      'buffer_size_kb x CPUs x 2'.
      
      Why two times difference is there? because ftrace implicitly allocate
      the buffer for max latency too.
      
      That makes sad result when admin want to use large buffer. (If admin
      want full logging and makes detail analysis). example, If admin
      have 24 CPUs machine and write 200MB to buffer_size_kb, the system
      consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
      usually unacceptable.
      
      Fortunatelly, almost all users don't use max latency feature.
      The max latency buffer can be disabled easily.
      
      This patch shrink buffer size of the max latency buffer if
      unnecessary.
      Signed-off-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ef710e10
    • Li Zefan's avatar
      tracing: Allow to disable cmdline recording · e870e9a1
      Li Zefan authored
      
      
      We found that even enabling a single trace event that will rarely be
      triggered can add big overhead to context switch.
      
      (lmbench context switch test)
       -------------------------------------------------
       2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
       ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
      ------ ------ ------ ------ ------ ------- -------
        2.19   2.3   2.21   2.56   2.13     2.54    2.07
        2.39   2.51  2.35   2.75   2.27     2.81    2.24
      
      The overhead is 6% ~ 11%.
      
      It's because when a trace event is enabled 3 tracepoints (sched_switch,
      sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.
      
      We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
      to allow to disable cmdline recording.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e870e9a1
  26. 20 Jul, 2010 3 commits
  27. 09 Jul, 2010 1 commit