1. 30 Apr, 2010 2 commits
  2. 09 Dec, 2009 2 commits
    • Johannes Berg's avatar
      tracing: Add full state to trace_seq · d184b31c
      Johannes Berg authored
      The trace_seq buffer might fill up, and right now one needs to check the
      return value of each printf into the buffer to check for that.
      Instead, have the buffer keep track of whether it is full or not, and
      reject more input if it is full or would have overflowed with an input
      that wasn't added.
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      tracing: Buffer the output of seq_file in case of filled buffer · a63ce5b3
      Steven Rostedt authored
      If the seq_read fills the buffer it will call s_start again on the next
      itertation with the same position. This causes a problem with the
      function_graph tracer because it consumes the iteration in order to
      determine leaf functions.
      What happens is that the iterator stores the entry, and the function
      graph plugin will look at the next entry. If that next entry is a return
      of the same function and task, then the function is a leaf and the
      function_graph plugin calls ring_buffer_read which moves the ring buffer
      iterator forward (the trace iterator still points to the function start
      The copying of the trace_seq to the seq_file buffer will fail if the
      seq_file buffer is full. The seq_read will not show this entry.
      The next read by userspace will cause seq_read to again call s_start
      which will reuse the trace iterator entry (the function start entry).
      But the function return entry was already consumed. The function graph
      plugin will think that this entry is a nested function and not a leaf.
      To solve this, the trace code now checks the return status of the
      seq_printf (trace_print_seq). If the writing to the seq_file buffer
      fails, we set a flag in the iterator (leftover) and we do not reset
      the trace_seq buffer. On the next call to s_start, we check the leftover
      flag, and if it is set, we just reuse the trace_seq buffer and do not
      call into the plugin print functions.
      Before this patch:
       2)               |      fput() {
       2)               |        __fput() {
       2)   0.550 us    |          inotify_inode_queue_event();
       2)               |          __fsnotify_parent() {
       2)   0.540 us    |          inotify_dentry_parent_queue_event();
      After the patch:
       2)               |      fput() {
       2)               |        __fput() {
       2)   0.550 us    |          inotify_inode_queue_event();
       2)   0.548 us    |          __fsnotify_parent();
       2)   0.540 us    |          inotify_dentry_parent_queue_event();
        Updated the patch to fix a missing return 0 from the trace_print_seq()
        stub when CONFIG_TRACING is disabled.
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Reported-by: default avatarJiri Olsa <jolsa@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  3. 24 Oct, 2009 1 commit
    • Jiri Olsa's avatar
      tracing: Fix trace_seq_printf() return value · 3e69533b
      Jiri Olsa authored
      trace_seq_printf() return value is a little ambiguous. It
      currently returns the length of the space available in the
      buffer. printf usually returns the amount written. This is not
      adequate here, because:
        trace_seq_printf(s, "");
      is perfectly legal, and returning 0 would indicate that it
      We can always see the amount written by looking at the before
      and after values of s->len. This is not quite the same use as
      printf. We only care if the string was successfully written to
      the buffer or not.
      Make trace_seq_printf() return 0 if the trace oversizes the
      buffer's free space, 1 otherwise.
      Signed-off-by: default avatarJiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091023233646.631787612@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  4. 07 Oct, 2009 1 commit
  5. 06 Oct, 2009 1 commit
  6. 11 Sep, 2009 3 commits
    • Steven Rostedt's avatar
      tracing: consolidate code between trace_output.c and trace_function_graph.c · f81c972d
      Steven Rostedt authored
      Both trace_output.c and trace_function_graph.c do basically the same
      thing to handle the printing of the latency-format. This patch moves
      the code into one function that both can use.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      tracing: add lock depth to entries · 637e7e86
      Steven Rostedt authored
      This patch adds the lock depth of the big kernel lock to the generic
      entry header. This way we can see the depth of the lock and help
      in removing the BKL.
       #                  _------=> CPU#
       #                 / _-----=> irqs-off
       #                | / _----=> need-resched
       #                || / _---=> hardirq/softirq
       #                ||| / _--=> preempt-depth
       #                |||| /_--=> lock-depth
       #                |||||/     delay
       #  cmd     pid   |||||| time  |   caller
       #     \   /      ||||||   \   |   /
         <idle>-0       2.N..3 5902255250us+: lock_acquire: read rcu_read_lock
         <idle>-0       2.N..3 5902255253us+: lock_release: rcu_read_lock
         <idle>-0       2dN..3 5902255257us+: lock_acquire: xtime_lock
         <idle>-0       2dN..4 5902255259us : lock_acquire: clocksource_lock
         <idle>-0       2dN..4 5902255261us+: lock_release: clocksource_lock
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      tracing: move tgid out of generic entry and into userstack · 48659d31
      Steven Rostedt authored
      The userstack trace required the recording of the tgid entry.
      Unfortunately, it was added to the generic entry where it wasted
      4 bytes of every entry and was only used by one entry.
      This patch moves it out of the generic field and moves it into the
      only user (userstack_entry).
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  7. 02 Jul, 2009 1 commit
  8. 09 Jun, 2009 2 commits
  9. 03 Jun, 2009 4 commits
    • Steven Rostedt's avatar
      tracing: add annotation to what type of stack trace is recorded · 563af16c
      Steven Rostedt authored
      The current method of printing out a stack trace is to add a new line
      and print out the trace:
          yum-updatesd-3120  [002]   573.691303:
       => do_softirq
       => irq_exit
       => smp_apic_timer_interrupt
       => apic_timer_interrupt
      This looks a bit awkward, and if we have both stack and user stack traces
      running, it would be nice to have a title to tell them apart, although
      it is easy to tell by the output.
      This patch adds an annotation to the start of the stack traces:
                  init-1     [003]   929.304979: <stack trace>
       => user_path_at
       => vfs_fstatat
       => vfs_stat
       => sys_newstat
       => system_call_fastpath
                   cat-3459  [002]  1016.824040: <user stack trace>
       =>  <0000003aae6c0250>
       =>  <00007ffff4b06ae4>
       =>  <69636172742f6775>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Whitehouse's avatar
      tracing: fix multiple use of __print_flags and __print_symbolic · 56d8bd3f
      Steven Whitehouse authored
      Here is an updated patch to include the extra call to
      trace_seq_init() as requested. This is vs. the latest
      -tip tree and fixes the use of multiple __print_flags
      and __print_symbolic in a single tracer. Also tested
      to ensure its working now:
      mount.gfs2-2534  [000]   235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR
      mount.gfs2-2534  [000]   235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI
      mount.gfs2-2534  [000]   235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX
      glock_workqueue-2529  [000]   235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI
      glock_workqueue-2529  [000]   235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I
      Signed-off-by: default avatarSteven Whitehouse <swhiteho@redhat.com>
      LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • walimis's avatar
      tracing/events: fix output format of user stack · 048dc50c
      walimis authored
      According to "events/ftrace/user_stack/format", fix the output of
      user stack.
      before fix:
        sh-1073  [000]    31.137561:  <b7f274fe> <-  <0804e33c> <-  <080835c1>
      after fix:
        sh-1072  [000]    37.039329:
       =>  <b7f8a4fe>
       =>  <0804e33c>
       =>  <080835c1>
      Signed-off-by: default avatarwalimis <walimisdev@gmail.com>
      LKML-Reference: <1244016090-7814-3-git-send-email-walimisdev@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • walimis's avatar
      tracing/events: fix output format of kernel stack · f11b3f4e
      walimis authored
      According to "events/ftrace/kernel_stack/format", output format of
      kernel stack should use "=>" instead of "<=".
      The second problem is that we shouldn't skip the first entry in the stack,
      although it seems to be duplicated when used in the "function" tracer,
      but events also use it. If we skip the first one, we will drop the topmost
      entry of the stack.
      The last problem is that if the last entry is ULONG_MAX(0xffffffff), we should
      drop it, otherwise it will print a NULL name line.
      before fix:
            sh-1072  [000]   26.957239: sched_process_fork: parent sh:1072 child sh:1073
            sh-1072  [000]   26.957262:
       <= syscall_call
            sh-1072  [000]   26.957744: sched_switch: task sh:1072 [120] (R) ==> sh:1073 [120]
            sh-1072  [000]   26.957752:
       <= preempt_schedule
       <= wake_up_new_task
       <= do_fork
       <= sys_clone
       <= syscall_call
      After fix:
            sh-1075  [000]    39.791848: sched_process_fork: parent sh:1075  child sh:1076
            sh-1075  [000]    39.791871:
       => sys_clone
       => syscall_call
            sh-1075  [000]    39.792713: sched_switch: task sh:1075 [120] (R) ==> sh:1076 [120]
            sh-1075  [000]    39.792722:
       => schedule
       => preempt_schedule
       => wake_up_new_task
       => do_fork
       => sys_clone
       => syscall_call
      Signed-off-by: default avatarwalimis <walimisdev@gmail.com>
      LKML-Reference: <1244016090-7814-2-git-send-email-walimisdev@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  10. 02 Jun, 2009 1 commit
  11. 26 May, 2009 2 commits
    • Steven Rostedt's avatar
      tracing: add __print_symbolic to trace events · 0f4fc29d
      Steven Rostedt authored
      This patch adds __print_symbolic which is similar to __print_flags but
      works for an enumeration type instead. That is, there is only a one to one
      mapping between the values and the symbols. When a match is made, then
      it is printed, otherwise the hex value is outputed.
      [ Impact: add interface for showing symbol names in events ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
    • Steven Rostedt's avatar
      tracing: add __print_flags for events · be74b73a
      Steven Rostedt authored
      Developers have been asking for the ability in the ftrace event tracer
      to display names of bits in a flags variable.
      Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
      assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.
      Some examples where this would be useful are the state flags in a context
      switch, kmalloc flags, and even permision flags in accessing files.
        v2 changes include:
        Frederic Weisbecker's idea of using a mask instead of bits,
        thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.
        Li Zefan's idea of allowing the caller of __print_flags to add their
        own delimiter (or no delimiter) where we can get for file permissions
        rwx instead of r|w|x.
        v3 changes:
         Christoph Hellwig's idea of using an array instead of va_args.
      [ Impact: better displaying of flags in trace output ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
  12. 25 May, 2009 1 commit
    • Lai Jiangshan's avatar
      tracing: add trace_event_read_lock() · 4f535968
      Lai Jiangshan authored
      I found that there is nothing to protect event_hash in
      ftrace_find_event(). Rcu protects the event hashlist
      but not the event itself while we use it after its extraction
      through ftrace_find_event().
      This lack of a proper locking in this spot opens a race
      window between any event dereferencing and module removal.
      --Task A--
      print_trace_line(trace) {
        event = find_ftrace_event(trace)
      --Task B--
      trace_module_remove_events(mod) {
        list_trace_events_module(ev, mod) {
          unregister_ftrace_event(ev->event) {
      |--> module removed, the event has been dropped
      --Task A--
        event->print(trace); // Dereferencing freed memory
      If the event retrieved belongs to a module and this module
      is concurrently removed, we may end up dereferencing a data
      from a freed module.
      RCU could solve this, but it would add latency to the kernel and
      forbid tracers output callbacks to call any sleepable code.
      So this fix converts 'trace_event_mutex' to a read/write semaphore,
      and adds trace_event_read_lock() to protect ftrace_find_event().
      [ Impact: fix possible freed memory dereference in ftrace ]
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <4A114806.7090302@cn.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
  13. 15 May, 2009 1 commit
  14. 06 May, 2009 1 commit
  15. 25 Apr, 2009 1 commit
    • Steven Rostedt's avatar
      tracing/events: reuse trace event ids after overflow · 060fa5c8
      Steven Rostedt authored
      With modules being able to add trace events, and the max trace event
      counter is 16 bits (65536) we can overflow the counter easily
      with a simple while loop adding and removing modules that contain
      trace events.
      This patch links together the registered trace events and on overflow
      searches for available trace event ids. It will still fail if
      over 65536 events are registered, but considering that a typical
      kernel only has 22000 functions, 65000 events should be sufficient.
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  16. 24 Apr, 2009 1 commit
    • Steven Rostedt's avatar
      tracing: increase size of number of possible events · 89ec0dee
      Steven Rostedt authored
      With the new event tracing registration, we must increase the number
      of events that can be registered. Currently the type field is only
      one byte, which leaves us only 256 possible events.
      Since we do not save the CPU number in the tracer anymore (it is determined
      by the per cpu ring buffer that is used) we have an extra byte to use.
      This patch increases the size of type from 1 byte (256 events) to
      2 bytes (65,536 events).
      It also adds a WARN_ON_ONCE if we exceed that limit.
      [ Impact: allow more than 255 events ]
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
  17. 14 Apr, 2009 1 commit
  18. 07 Apr, 2009 1 commit
    • Lai Jiangshan's avatar
      tracing: fix incorrect return type of ns2usecs() · cf8e3474
      Lai Jiangshan authored
      Impact: fix time output bug in 32bits system
      ns2usecs() returns 'long', it's incorrect.
      (In i386)
                <idle>-0     [000]   521.442100: _spin_lock <-tick_do_update_jiffies64
                <idle>-0     [000]   521.442101: do_timer <-tick_do_update_jiffies64
                <idle>-0     [000]   521.442102: update_wall_time <-do_timer
                <idle>-0     [000]   521.442102: update_xtime_cache <-update_wall_time
      (It always print the time less than 2200 seconds besides ...)
      Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)
                <idle>-0     [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
                <idle>-0     [001] 4154502640.134760: _local_bh_enable <-__do_softirq
                <idle>-0     [001] 4154502640.134761: idle_cpu <-irq_exit
      (very large value)
      Because 'long' is a signed type and it is 32bits in i386.
      Changes in v2:
      return 'unsigned long long' instead of 'cycle_t'
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <49D05D10.4030009@cn.fujitsu.com>
      Reported-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  19. 31 Mar, 2009 2 commits
  20. 25 Mar, 2009 1 commit
    • Steven Rostedt's avatar
      tracing: adding function timings to function profiler · 0706f1c4
      Steven Rostedt authored
      If the function graph trace is enabled, the function profiler will
      use it to take the timing of the functions.
       cat /debug/tracing/trace_stat/functions
        Function                               Hit    Time
        --------                               ---    ----
        mwait_idle                             127    183028.4 us
        schedule                                26    151997.7 us
        __schedule                              31    151975.1 us
        sys_wait4                                2    74080.53 us
        do_wait                                  2    74077.80 us
        sys_newlstat                           138    39929.16 us
        do_path_lookup                         179    39845.79 us
        vfs_lstat_fd                           138    39761.97 us
        user_path_at                           153    39469.58 us
        path_walk                              179    39435.76 us
        __link_path_walk                       189    39143.73 us
      Note the times are skewed due to the function graph tracer not taking
      into account schedules.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
  21. 20 Mar, 2009 1 commit
    • Peter Zijlstra's avatar
      ftrace: ensure every event gets an id · 28bea271
      Peter Zijlstra authored
      Impact: widen user-space visibe event IDs to all events
      Previously only TRACE_EVENT events got ids, because only they
      generated raw output which needs to be demuxed from the trace.
      In order to provide a unique ID for each event, register everybody,
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090319194233.464914218@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  22. 19 Mar, 2009 1 commit
  23. 17 Mar, 2009 1 commit
    • Steven Rostedt's avatar
      tracing: protect reader of cmdline output · 4ca53085
      Steven Rostedt authored
      Impact: fix to one cause of incorrect comm outputs in trace
      The spinlock only protected the creation of a comm <=> pid pair.
      But it was possible that a reader could look up a pid, and get the
      wrong comm because it had no locking.
      This also required changing trace_find_cmdline to copy the comm cache
      and not just send back a pointer to it.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
  24. 13 Mar, 2009 1 commit
    • Frederic Weisbecker's avatar
      tracing/core: bring back raw trace_printk for dynamic formats strings · 48ead020
      Frederic Weisbecker authored
      Impact: fix callsites with dynamic format strings
      Since its new binary implementation, trace_printk() internally uses static
      containers for the format strings on each callsites. But the value is
      assigned once at build time, which means that it can't take dynamic
      So this patch unearthes the raw trace_printk implementation for the callers
      that will need trace_printk to be able to carry these dynamic format
      strings. The trace_printk() macro will use the appropriate implementation
      for each callsite. Most of the time however, the binary implementation will
      still be used.
      The other impact of this patch is that mmiotrace_printk() will use the old
      implementation because it calls the low level trace_vprintk and we can't
      guess here whether the format passed in it is dynamic or not.
      Some parts of this patch have been written by Steven Rostedt (most notably
      the part that chooses the appropriate implementation for each callsites).
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
  25. 10 Mar, 2009 1 commit
  26. 06 Mar, 2009 2 commits
    • Frederic Weisbecker's avatar
      tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() · 769b0441
      Frederic Weisbecker authored
      Impact: faster and lighter tracing
      Now that we have trace_bprintk() which is faster and consume lesser
      memory than trace_printk() and has the same purpose, we can now drop
      the old implementation in favour of the binary one from trace_bprintk(),
      which means we move all the implementation of trace_bprintk() to
      trace_printk(), so the Api doesn't change except that we must now use
      trace_seq_bprintk() to print the TRACE_PRINT entries.
      Some changes result of this:
      - Previously, trace_bprintk depended of a single tracer and couldn't
        work without. This tracer has been dropped and the whole implementation
        of trace_printk() (like the module formats management) is now integrated
        in the tracing core (comes with CONFIG_TRACING), though we keep the file
        trace_printk (previously trace_bprintk.c) where we can find the module
        management. Thus we don't overflow trace.c
      - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
      - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
        constants, and fix 'const' qualifiers warnings. But this is all transparent for
      - etc...
      - Rebase against last changes
      - Fix mispell on the changelog
      - Rebase against last changes (moving trace_printk() to kernel.h)
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    • Lai Jiangshan's avatar
      tracing: infrastructure for supporting binary record · 1427cdf0
      Lai Jiangshan authored
      Impact: save on memory for tracing
      Current tracers are typically using a struct(like struct ftrace_entry,
      struct ctx_switch_entry, struct special_entr etc...)to record a binary
      event. These structs can only record a their own kind of events.
      A new kind of tracer need a new struct and a lot of code too handle it.
      So we need a generic binary record for events. This infrastructure
      is for this purpose.
      [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
      tracing as reported by Steven Rostedt]
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  27. 05 Mar, 2009 1 commit
  28. 08 Feb, 2009 1 commit
    • Steven Rostedt's avatar
      trace: remove deprecated entry->cpu · 1830b52d
      Steven Rostedt authored
      Impact: fix to prevent developers from using entry->cpu
      With the new ring buffer infrastructure, the cpu for the entry is
      implicit with which CPU buffer it is on.
      The original code use to record the current cpu into the generic
      entry header, which can be retrieved by entry->cpu. When the
      ring buffer was introduced, the users were convert to use the
      the cpu number of which cpu ring buffer was in use (this was passed
      to the tracers by the iterator: iter->cpu).
      Unfortunately, the cpu item in the entry structure was never removed.
      This allowed for developers to use it instead of the proper iter->cpu,
      unknowingly, using an uninitialized variable. This was not the fault
      of the developers, since it would seem like the logical place to
      retrieve the cpu identifier.
      This patch removes the cpu item from the entry structure and fixes
      all the users that should have been using iter->cpu.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
  29. 05 Feb, 2009 1 commit