1. 13 Apr, 2013 1 commit
  2. 15 Mar, 2013 1 commit
  3. 30 Jan, 2013 1 commit
    • Hiraku Toyooka's avatar
      tracing: Make a snapshot feature available from userspace · debdd57f
      Hiraku Toyooka authored
      Ftrace has a snapshot feature available from kernel space and
      latency tracers (e.g. irqsoff) are using it. This patch enables
      user applictions to take a snapshot via debugfs.
      
      Add "snapshot" debugfs file in "tracing" directory.
      
        snapshot:
          This is used to take a snapshot and to read the output of the
          snapshot.
      
           # echo 1 > snapshot
      
          This will allocate the spare buffer for snapshot (if it is
          not allocated), and take a snapshot.
      
           # cat snapshot
      
          This will show contents of the snapshot.
      
           # echo 0 > snapshot
      
          This will free the snapshot if it is allocated.
      
          Any other positive values will clear the snapshot contents if
          the snapshot is allocated, or return EINVAL if it is not allocated.
      
      Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia
      
      
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: default avatarHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      [
         Fixed irqsoff selftest and also a conflict with a change
         that fixes the update_max_tr.
      ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      debdd57f
  4. 23 Jan, 2013 4 commits
    • Steven Rostedt's avatar
      ring-buffer: User context bit recursion checking · 567cd4da
      Steven Rostedt authored
      
      
      Using context bit recursion checking, we can help increase the
      performance of the ring buffer.
      
      Before this patch:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 10.285
      Time: 10.407
      Time: 10.243
      Time: 10.372
      Time: 10.380
      Time: 10.198
      Time: 10.272
      Time: 10.354
      Time: 10.248
      Time: 10.253
      
      (average: 10.3012)
      
      Now we have:
      
       # echo function > /debug/tracing/current_tracer
       # for i in `seq 10`; do ./hackbench 50; done
      Time: 9.712
      Time: 9.824
      Time: 9.861
      Time: 9.827
      Time: 9.962
      Time: 9.905
      Time: 9.886
      Time: 10.088
      Time: 9.861
      Time: 9.834
      
      (average: 9.876)
      
       a 4% savings!
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      567cd4da
    • Steven Rostedt's avatar
      tracing: Avoid unnecessary multiple recursion checks · edc15caf
      Steven Rostedt authored
      
      
      When function tracing occurs, the following steps are made:
        If arch does not support a ftrace feature:
         call internal function (uses INTERNAL bits) which calls...
        If callback is registered to the "global" list, the list
         function is called and recursion checks the GLOBAL bits.
         then this function calls...
        The function callback, which can use the FTRACE bits to
         check for recursion.
      
      Now if the arch does not suppport a feature, and it calls
      the global list function which calls the ftrace callback
      all three of these steps will do a recursion protection.
      There's no reason to do one if the previous caller already
      did. The recursion that we are protecting against will
      go through the same steps again.
      
      To prevent the multiple recursion checks, if a recursion
      bit is set that is higher than the MAX bit of the current
      check, then we know that the check was made by the previous
      caller, and we can skip the current check.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      edc15caf
    • Steven Rostedt's avatar
      tracing: Make the trace recursion bits into enums · e46cbf75
      Steven Rostedt authored
      
      
      Convert the bits into enums which makes the code a little easier
      to maintain.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e46cbf75
    • Steven Rostedt's avatar
      ftrace: Add context level recursion bit checking · c29f122c
      Steven Rostedt authored
      
      
      Currently for recursion checking in the function tracer, ftrace
      tests a task_struct bit to determine if the function tracer had
      recursed or not. If it has, then it will will return without going
      further.
      
      But this leads to races. If an interrupt came in after the bit
      was set, the functions being traced would see that bit set and
      think that the function tracer recursed on itself, and would return.
      
      Instead add a bit for each context (normal, softirq, irq and nmi).
      
      A check of which context the task is in is made before testing the
      associated bit. Now if an interrupt preempts the function tracer
      after the previous context has been set, the interrupt functions
      can still be traced.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c29f122c
  5. 13 Nov, 2012 1 commit
    • David Sharp's avatar
      tracing: Format non-nanosec times from tsc clock without a decimal point. · 8be0709f
      David Sharp authored
      With the addition of the "tsc" clock, formatting timestamps to look like
      fractional seconds is misleading. Mark clocks as either in nanoseconds or
      not, and format non-nanosecond timestamps as decimal integers.
      
      Tested:
      $ cd /sys/kernel/debug/tracing/
      $ cat trace_clock
      [local] global tsc
      $ echo sched_switch > set_event
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ cat trace
                <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
                 sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
         sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo tsc > trace_clock
      $ cat trace
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ echo 0 > options/latency-format
      $ cat trace
                <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
                 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
         sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      
      v2:
      Move arch-specific bits out of generic code.
      v4:
      Fix x86_32 build due to 64-bit division.
      
      Google-Bug-Id: 6980623
      Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com
      
      
      
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarDavid Sharp <dhsharp@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8be0709f
  6. 02 Nov, 2012 2 commits
    • Steven Rostedt's avatar
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions · 0d5c6e1c
      Steven Rostedt authored
      
      
      Have the ring buffer commit function use the irq_work infrastructure to
      wake up any waiters waiting on the ring buffer for new data. The irq_work
      was created for such a purpose, where doing the actual wake up at the
      time of adding data is too dangerous, as an event or function trace may
      be in the midst of the work queue locks and cause deadlocks. The irq_work
      will either delay the action to the next timer interrupt, or trigger an IPI
      to itself forcing an interrupt to do the work (in a safe location).
      
      With irq_work, all ring buffer commits can safely do wakeups, removing
      the need for the ring buffer commit "nowake" variants, which were used
      by events and function tracing. All commits can now safely use the
      normal commit, and the "nowake" variants can be removed.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0d5c6e1c
    • Steven Rostedt's avatar
      tracing: Remove unused function unregister_tracer() · c7b84eca
      Steven Rostedt authored
      
      
      The function register_tracer() is only used by kernel core code,
      that never needs to remove the tracer. As trace_events have become
      the main way to add new tracing to the kernel, the need to
      unregister a tracer has diminished. Remove the unused function
      unregister_tracer(). If a need arises where we need it, then we
      can always add it back.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c7b84eca
  7. 31 Oct, 2012 3 commits
    • Steven Rostedt's avatar
      tracing: Cache comms only after an event occurred · 7ffbd48d
      Steven Rostedt authored
      
      
      Whenever an event is registered, the comm of tasks are saved at
      every task switch instead of saving them at every event. But if
      an event isn't executed much, the comm cache will be filled up
      by tasks that did not record the event and you lose out on the comms
      that did.
      
      Here's an example, if you enable the following events:
      
      echo 1 > /debug/tracing/events/kvm/kvm_cr/enable
      echo 1 > /debug/tracing/events/net/net_dev_xmit/enable
      
      Note, there's no kvm running on this machine so the first event will
      never be triggered, but because it is enabled, the storing of comms
      will continue. If we now disable the network event:
      
      echo 0 > /debug/tracing/events/net/net_dev_xmit/enable
      
      and look at the trace:
      
      cat /debug/tracing/trace
                  sshd-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                  sshd-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                  sshd-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                  sshd-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
                  sshd-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0
      
      We see that process 2672 which triggered the events has the comm "sshd".
      But if we run hackbench for a bit and look again:
      
      cat /debug/tracing/trace
                 <...>-2672  [001] ..s2   375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s1   376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
                 <...>-2672  [001] ..s2   377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                 <...>-2672  [001] ..s1   377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
                 <...>-2672  [001] ..s2   377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
                 <...>-2672  [001] ..s1   377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0
      
      The stored "sshd" comm has been flushed out and we get a useless "<...>".
      
      But by only storing comms after a trace event occurred, we can run
      hackbench all day and still get the same output.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7ffbd48d
    • Steven Rostedt's avatar
      tracing: Enable comm recording if trace_printk() is used · 81698831
      Steven Rostedt authored
      
      
      If comm recording is not enabled when trace_printk() is used then
      you just get this type of output:
      
      [ adding trace_printk("hello! %d", irq); in do_IRQ ]
      
                 <...>-2843  [001] d.h.    80.812300: do_IRQ: hello! 14
                 <...>-2734  [002] d.h2    80.824664: do_IRQ: hello! 14
                 <...>-2713  [003] d.h.    80.829971: do_IRQ: hello! 14
                 <...>-2814  [000] d.h.    80.833026: do_IRQ: hello! 14
      
      By enabling the comm recorder when trace_printk is enabled:
      
             hackbench-6715  [001] d.h.   193.233776: do_IRQ: hello! 21
                  sshd-2659  [001] d.h.   193.665862: do_IRQ: hello! 21
                <idle>-0     [001] d.h1   193.665996: do_IRQ: hello! 21
      Suggested-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      81698831
    • Hiraku Toyooka's avatar
      tracing: Change tracer's integer flags to bool · f43c738b
      Hiraku Toyooka authored
      print_max and use_max_tr in struct tracer are "int" variables and
      used like flags. This is wasteful, so change the type to "bool".
      
      Link: http://lkml.kernel.org/r/20121002082710.9807.86393.stgit@falsita
      
      Signed-off-by: default avatarHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f43c738b
  8. 24 Sep, 2012 1 commit
  9. 18 Sep, 2012 1 commit
  10. 31 Jul, 2012 1 commit
    • Steven Rostedt's avatar
      ftrace: Add selftest to test function save-regs support · ad97772a
      Steven Rostedt authored
      
      
      Add selftests to test the save-regs functionality of ftrace.
      
      If the arch supports saving regs, then it will make sure that regs is
      at least not NULL in the callback.
      
      If the arch does not support saving regs, it makes sure that the
      registering of the ftrace_ops that requests saving regs fails.
      It then tests the registering of the ftrace_ops succeeds if the
      'IF_SUPPORTED' flag is set. Then it makes sure that the regs passed to
      the function is NULL.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ad97772a
  11. 28 Jun, 2012 1 commit
    • Steven Rostedt's avatar
      tracing: Remove NR_CPUS array from trace_iterator · 6d158a81
      Steven Rostedt authored
      
      
      Replace the NR_CPUS array of buffer_iter from the trace_iterator
      with an allocated array. This will just create an array of
      possible CPUS instead of the max number specified.
      
      The use of NR_CPUS in that array caused allocation failures for
      machines that were tight on memory. This did not cause any failures
      to the system itself (no crashes), but caused unnecessary failures
      for reading the trace files.
      
      Added a helper function called 'trace_buffer_iter()' that returns
      the buffer_iter item or NULL if it is not defined or the array was
      not allocated. Some routines do not require the array
      (tracing_open_pipe() for one).
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6d158a81
  12. 07 May, 2012 1 commit
    • Srikar Dronamraju's avatar
      tracing: Provide trace events interface for uprobes · f3f096cf
      Srikar Dronamraju authored
      
      
      Implements trace_event support for uprobes. In its current form
      it can be used to put probes at a specified offset in a file and
      dump the required registers when the code flow reaches the
      probed address.
      
      The following example shows how to dump the instruction pointer
      and %ax a register at the probed text address.  Here we are
      trying to probe zfree in /bin/zsh:
      
       # cd /sys/kernel/debug/tracing/
       # cat /proc/`pgrep  zsh`/maps | grep /bin/zsh | grep r-xp
       00400000-0048a000 r-xp 00000000 08:03 130904 /bin/zsh
       # objdump -T /bin/zsh | grep -w zfree
       0000000000446420 g    DF .text  0000000000000012  Base
       zfree # echo 'p /bin/zsh:0x46420 %ip %ax' > uprobe_events
       # cat uprobe_events
       p:uprobes/p_zsh_0x46420 /bin/zsh:0x0000000000046420
       # echo 1 > events/uprobes/enable
       # sleep 20
       # echo 0 > events/uprobes/enable
       # cat trace
       # tracer: nop
       #
       #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
       #              | |       |          |         |
                    zsh-24842 [006] 258544.995456: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
                    zsh-24842 [007] 258545.000270: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
                    zsh-24842 [002] 258545.043929: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
                    zsh-24842 [004] 258547.046129: p_zsh_0x46420: (0x446420) arg1=446421 arg2=79
      Signed-off-by: default avatarSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Acked-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
      Cc: Linux-mm <linux-mm@kvack.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Anton Arapov <anton@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20120411103043.GB29437@linux.vnet.ibm.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f3f096cf
  13. 24 Apr, 2012 2 commits
    • Vaibhav Nagarnaik's avatar
      ring-buffer: Add per_cpu ring buffer control files · 438ced17
      Vaibhav Nagarnaik authored
      Add a debugfs entry under per_cpu/ folder for each cpu called
      buffer_size_kb to control the ring buffer size for each CPU
      independently.
      
      If the global file buffer_size_kb is used to set size, the individual
      ring buffers will be adjusted to the given size. The buffer_size_kb will
      report the common size to maintain backward compatibility.
      
      If the buffer_size_kb file under the per_cpu/ directory is used to
      change buffer size for a specific CPU, only the size of the respective
      ring buffer is updated. When tracing/buffer_size_kb is read, it reports
      'X' to indicate that sizes of per_cpu ring buffers are not equivalent.
      
      Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com
      
      
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Justin Teravest <teravest@google.com>
      Signed-off-by: default avatarVaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      438ced17
    • Steven Rostedt's avatar
      tracing: Add percpu buffers for trace_printk() · 07d777fe
      Steven Rostedt authored
      
      
      Currently, trace_printk() uses a single buffer to write into
      to calculate the size and format needed to save the trace. To
      do this safely in an SMP environment, a spin_lock() is taken
      to only allow one writer at a time to the buffer. But this could
      also affect what is being traced, and add synchronization that
      would not be there otherwise.
      
      Ideally, using percpu buffers would be useful, but since trace_printk()
      is only used in development, having per cpu buffers for something
      never used is a waste of space. Thus, the use of the trace_bprintk()
      format section is changed to be used for static fmts as well as dynamic ones.
      Then at boot up, we can check if the section that holds the trace_printk
      formats is non-empty, and if it does contain something, then we
      know a trace_printk() has been added to the kernel. At this time
      the trace_printk per cpu buffers are allocated. A check is also
      done at module load time in case a module is added that contains a
      trace_printk().
      
      Once the buffers are allocated, they are never freed. If you use
      a trace_printk() then you should know what you are doing.
      
      A buffer is made for each type of context:
      
        normal
        softirq
        irq
        nmi
      
      The context is checked and the appropriate buffer is used.
      This allows for totally lockless usage of trace_printk(),
      and they no longer even disable interrupts.
      Requested-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      07d777fe
  14. 14 Apr, 2012 1 commit
  15. 13 Mar, 2012 1 commit
  16. 22 Feb, 2012 1 commit
    • Steven Rostedt's avatar
      tracing/ring-buffer: Only have tracing_on disable tracing buffers · 499e5470
      Steven Rostedt authored
      
      
      As the ring-buffer code is being used by other facilities in the
      kernel, having tracing_on file disable *all* buffers is not a desired
      affect. It should only disable the ftrace buffers that are being used.
      
      Move the code into the trace.c file and use the buffer disabling
      for tracing_on() and tracing_off(). This way only the ftrace buffers
      will be affected by them and other kernel utilities will not be
      confused to why their output suddenly stopped.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      499e5470
  17. 21 Feb, 2012 5 commits
  18. 04 Jan, 2012 1 commit
  19. 17 Nov, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Add irq, preempt-count and need resched info to default trace output · 77271ce4
      Steven Rostedt authored
      
      
      People keep asking how to get the preempt count, irq, and need resched info
      and we keep telling them to enable the latency format. Some developers think
      that traces without this info is completely useless, and for a lot of tasks
      it is useless.
      
      The first option was to enable the latency trace as the default format, but
      the header for the latency format is pretty useless for most tracers and
      it also does the timestamp in straight microseconds from the time the trace
      started. This is sometimes more difficult to read as the default trace is
      seconds from the start of boot up.
      
      Latency format:
      
       # tracer: nop
       #
       # nop latency trace v1.1.5 on 3.2.0-rc1-test+
       # --------------------------------------------------------------------
       # latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
       #    -----------------
       #    | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
       #    -----------------
       #
       #                  _------=> CPU#
       #                 / _-----=> irqs-off
       #                | / _----=> need-resched
       #                || / _---=> hardirq/softirq
       #                ||| / _--=> preempt-depth
       #                |||| /     delay
       #  cmd     pid   ||||| time  |   caller
       #     \   /      |||||  \    |   /
       migratio-6       0...2 41778231us+: rcu_note_context_switch <-__schedule
       migratio-6       0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch
       migratio-6       0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch
       migratio-6       0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch
       migratio-6       0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch
       migratio-6       0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule
      
      default format:
      
       # tracer: nop
       #
       #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
       #              | |       |          |         |
            migration/0-6     [000]    50.025810: rcu_note_context_switch <-__schedule
            migration/0-6     [000]    50.025812: trace_rcu_utilization <-rcu_note_context_switch
            migration/0-6     [000]    50.025813: rcu_sched_qs <-rcu_note_context_switch
            migration/0-6     [000]    50.025815: rcu_preempt_qs <-rcu_note_context_switch
            migration/0-6     [000]    50.025817: trace_rcu_utilization <-rcu_note_context_switch
            migration/0-6     [000]    50.025818: debug_lockdep_rcu_enabled <-__schedule
            migration/0-6     [000]    50.025820: debug_lockdep_rcu_enabled <-__schedule
      
      The latency format header has latency information that is pretty meaningless
      for most tracers. Although some of the header is useful, and we can add that
      later to the default format as well.
      
      What is really useful with the latency format is the irqs-off, need-resched
      hard/softirq context and the preempt count.
      
      This commit adds the option irq-info which is on by default that adds this
      information:
      
       # tracer: nop
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
                 <idle>-0     [000] d..2    49.309305: cpuidle_get_driver <-cpuidle_idle_call
                 <idle>-0     [000] d..2    49.309307: mwait_idle <-cpu_idle
                 <idle>-0     [000] d..2    49.309309: need_resched <-mwait_idle
                 <idle>-0     [000] d..2    49.309310: test_ti_thread_flag <-need_resched
                 <idle>-0     [000] d..2    49.309312: trace_power_start.constprop.13 <-mwait_idle
                 <idle>-0     [000] d..2    49.309313: trace_cpu_idle <-mwait_idle
                 <idle>-0     [000] d..2    49.309315: need_resched <-mwait_idle
      
      If a user wants the old format, they can disable the 'irq-info' option:
      
       # tracer: nop
       #
       #           TASK-PID   CPU#      TIMESTAMP  FUNCTION
       #              | |       |          |         |
                 <idle>-0     [000]     49.309305: cpuidle_get_driver <-cpuidle_idle_call
                 <idle>-0     [000]     49.309307: mwait_idle <-cpu_idle
                 <idle>-0     [000]     49.309309: need_resched <-mwait_idle
                 <idle>-0     [000]     49.309310: test_ti_thread_flag <-need_resched
                 <idle>-0     [000]     49.309312: trace_power_start.constprop.13 <-mwait_idle
                 <idle>-0     [000]     49.309313: trace_cpu_idle <-mwait_idle
                 <idle>-0     [000]     49.309315: need_resched <-mwait_idle
      Requested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      77271ce4
  20. 07 Nov, 2011 1 commit
    • Jiri Olsa's avatar
      tracing/latency: Fix header output for latency tracers · 7e9a49ef
      Jiri Olsa authored
      In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the
      function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers
      do not display proper latency header.
      
      The involved/fixed latency tracers are:
              wakeup_rt
              wakeup
              preemptirqsoff
              preemptoff
              irqsoff
      
      The patch adds proper handling of tracer configuration options for latency
      tracers, and displaying correct header info accordingly.
      
      * The current output (for wakeup tracer) with both graph and function
        tracers disabled is:
      
        # tracer: wakeup
        #
          <idle>-0       0d.h5    1us+:      0:120:R   + [000]     7:  0:R watchdog/0
          <idle>-0       0d.h5    3us+: ttwu_do_activate.clone.1 <-try_to_wake_up
          ...
      
      * The fixed output is:
      
        # tracer: wakeup
        #
        # wakeup latency trace v1.1.5 on 3.1.0-tip+
        # --------------------------------------------------------------------
        # latency: 55 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
        #    -----------------
        #    | task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99)
        #    -----------------
        #
        #                  _------=> CPU#
        #                 / _-----=> irqs-off
        #                | / _----=> need-resched
        #                || / _---=> hardirq/softirq
        #                ||| / _--=> preempt-depth
        #                |||| /     delay
        #  cmd     pid   ||||| time  |   caller
        #     \   /      |||||  \    |   /
             cat-1129    0d..4    1us :   1129:120:R   + [000]     6:  0:R migration/0
             cat-1129    0d..4    2us+: ttwu_do_activate.clone.1 <-try_to_wake_up
      
      * The current output (for wakeup tracer) with only function
        tracer enabled is:
      
        # tracer: wakeup
        #
             cat-1140    0d..4    1us+:   1140:120:R   + [000]     6:  0:R migration/0
             cat-1140    0d..4    2us : ttwu_do_activate.clone.1 <-try_to_wake_up
      
      * The fixed output is:
        # tracer: wakeup
        #
        # wakeup latency trace v1.1.5 on 3.1.0-tip+
        # --------------------------------------------------------------------
        # latency: 207 us, #109/109, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
        #    -----------------
        #    | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99)
        #    -----------------
        #
        #                  _------=> CPU#
        #                 / _-----=> irqs-off
        #                | / _----=> need-resched
        #                || / _---=> hardirq/softirq
        #                ||| / _--=> preempt-depth
        #                |||| /     delay
        #  cmd     pid   ||||| time  |   caller
        #     \   /      |||||  \    |   /
          <idle>-0       1d.h5    1us+:      0:120:R   + [001]    12:  0:R watchdog/1
          <idle>-0       1d.h5    3us : ttwu_do_activate.clone.1 <-try_to_wake_up
      
      Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com
      
      
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7e9a49ef
  21. 11 Oct, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Warn on output if the function tracer was found corrupted · e0a413f6
      Steven Rostedt authored
      
      
      As the function tracer is very intrusive, lots of self checks are
      performed on the tracer and if something is found to be strange
      it will shut itself down keeping it from corrupting the rest of the
      kernel. This shutdown may still allow functions to be traced, as the
      tracing only stops new modifications from happening. Trying to stop
      the function tracer itself can cause more harm as it requires code
      modification.
      
      Although a WARN_ON() is executed, a user may not notice it. To help
      the user see that something isn't right with the tracing of the system
      a big warning is added to the output of the tracer that lets the user
      know that their data may be incomplete.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e0a413f6
  22. 19 Aug, 2011 2 commits
  23. 26 Jul, 2011 1 commit
  24. 08 Jul, 2011 1 commit
  25. 07 Jul, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Fix bug when reading system filters on module removal · e9dbfae5
      Steven Rostedt authored
      
      
      The event system is freed when its nr_events is set to zero. This happens
      when a module created an event system and then later the module is
      removed. Modules may share systems, so the system is allocated when
      it is created and freed when the modules are unloaded and all the
      events under the system are removed (nr_events set to zero).
      
      The problem arises when a task opened the "filter" file for the
      system. If the module is unloaded and it removed the last event for
      that system, the system structure is freed. If the task that opened
      the filter file accesses the "filter" file after the system has
      been freed, the system will access an invalid pointer.
      
      By adding a ref_count, and using it to keep track of what
      is using the event system, we can free it after all users
      are finished with the event system.
      
      Cc: <stable@kernel.org>
      Reported-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e9dbfae5
  26. 15 Jun, 2011 2 commits
    • Masami Hiramatsu's avatar
      tracing/kprobes: Fix kprobe-tracer to support stack trace · 1fd8df2c
      Masami Hiramatsu authored
      
      
      Fix to support kernel stack trace correctly on kprobe-tracer.
      Since the execution path of kprobe-based dynamic events is different
      from other tracepoint-based events, normal ftrace_trace_stack() doesn't
      work correctly. To fix that, this introduces ftrace_trace_stack_regs()
      which traces stack via pt_regs instead of current stack register.
      
      e.g.
      
       # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
       # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
       # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
       # head -n 20 /sys/kernel/debug/tracing/trace
                  bash-2968  [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
                  bash-2968  [000] 10297.050247: <stack trace>
       => schedule_timeout
       => n_tty_read
       => tty_read
       => vfs_read
       => sys_read
       => system_call_fastpath
           kworker/0:1-2940  [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
           kworker/0:1-2940  [000] 10297.050266: <stack trace>
       => worker_thread
       => kthread
       => kernel_thread_helper
                  sshd-1132  [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
                  sshd-1132  [000] 10297.050365: <stack trace>
       => sysret_careful
      
      Note: Even with this fix, the first entry will be skipped
      if the probe is put on the function entry area before
      the frame pointer is set up (usually, that is 4 bytes
       (push %bp; mov %sp %bp) on x86), because stack unwinder
      depends on the frame pointer.
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1fd8df2c
    • Steven Rostedt's avatar
      tracing: Add disable_on_free option · cf30cf67
      Steven Rostedt authored
      
      
      Add a trace option to disable tracing on free. When this option is
      set, a write into the free_buffer file will not only shrink the
      ring buffer down to zero, but it will also disable tracing.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      cf30cf67
  27. 26 May, 2011 1 commit
    • Steven Rostedt's avatar
      ftrace: Add internal recursive checks · b1cff0ad
      Steven Rostedt authored
      Witold reported a reboot caused by the selftests of the dynamic function
      tracer. He sent me a config and I used ktest to do a config_bisect on it
      (as my config did not cause the crash). It pointed out that the problem
      config was CONFIG_PROVE_RCU.
      
      What happened was that if multiple callbacks are attached to the
      function tracer, we iterate a list of callbacks. Because the list is
      managed by synchronize_sched() and preempt_disable, the access to the
      pointers uses rcu_dereference_raw().
      
      When PROVE_RCU is enabled, the rcu_dereference_raw() calls some
      debugging functions, which happen to be traced. The tracing of the debug
      function would then call rcu_dereference_raw() which would then call the
      debug function and then... well you get the idea.
      
      I first wrote two different patches to solve this bug.
      
      1) add a __rcu_dereference_raw() that would not do any checks.
      2) add notrace to the offending debug functions.
      
      Both of these patches worked.
      
      Talking with Paul McKenney on IRC, he suggested to add recursion
      detection instead. This seemed to be a better solution, so I decided to
      implement it. As the task_struct already has a trace_recursion to detect
      recursion in the ring buffer, and that has a very small number it
      allows, I decided to use that same variable to add flags that can detect
      the recursion inside the infrastructure of the function tracer.
      
      I plan to change it so that the task struct bit can be checked in
      mcount, but as that requires changes to all archs, I will hold that off
      to the next merge window.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/1306348063.1465.116.camel@gandalf.stny.rr.com
      
      Reported-by: default avatarWitold Baryluk <baryluk@smp.if.uj.edu.pl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b1cff0ad