1. 15 Mar, 2013 40 commits
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add function-trace option to disable function tracing of latency tracers · 328df475
      Steven Rostedt (Red Hat) authored
      
      
      Currently, the only way to stop the latency tracers from doing function
      tracing is to fully disable the function tracer from the proc file
      system:
      
        echo 0 > /proc/sys/kernel/ftrace_enabled
      
      This is a big hammer approach as it disables function tracing for
      all users. This includes kprobes, perf, stack tracer, etc.
      
      Instead, create a function-trace option that the latency tracers can
      check to determine if it should enable function tracing or not.
      This option can be set or cleared even while the tracer is active
      and the tracers will disable or enable function tracing depending
      on how the option was set.
      
      Instead of using the proc file, disable latency function tracing with
      
        echo 0 > /debug/tracing/options/function-trace
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      328df475
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Remove most or all of stack tracer stack size from stack_max_size · 4df29712
      Steven Rostedt (Red Hat) authored
      
      
      Currently, the depth reported in the stack tracer stack_trace file
      does not match the stack_max_size file. This is because the stack_max_size
      includes the overhead of stack tracer itself while the depth does not.
      
      The first time a max is triggered, a calculation is not performed that
      figures out the overhead of the stack tracer and subtracts it from
      the stack_max_size variable. The overhead is stored and is subtracted
      from the reported stack size for comparing for a new max.
      
      Now the stack_max_size corresponds to the reported depth:
      
       # cat stack_max_size
      4640
      
       # cat stack_trace
              Depth    Size   Location    (48 entries)
              -----    ----   --------
        0)     4640      32   _raw_spin_lock+0x18/0x24
        1)     4608     112   ____cache_alloc+0xb7/0x22d
        2)     4496      80   kmem_cache_alloc+0x63/0x12f
        3)     4416      16   mempool_alloc_slab+0x15/0x17
      [...]
      
      While testing against and older gcc on x86 that uses mcount instead
      of fentry, I found that pasing in ip + MCOUNT_INSN_SIZE let the
      stack trace show one more function deep which was missing before.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4df29712
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix stack tracer with fentry use · d4ecbfc4
      Steven Rostedt (Red Hat) authored
      
      
      When gcc 4.6 on x86 is used, the function tracer will use the new
      option -mfentry which does a call to "fentry" at every function
      instead of "mcount". The significance of this is that fentry is
      called as the first operation of the function instead of the mcount
      usage of being called after the stack.
      
      This causes the stack tracer to show some bogus results for the size
      of the last function traced, as well as showing "ftrace_call" instead
      of the function. This is due to the stack frame not being set up
      by the function that is about to be traced.
      
       # cat stack_trace
              Depth    Size   Location    (48 entries)
              -----    ----   --------
        0)     4824     216   ftrace_call+0x5/0x2f
        1)     4608     112   ____cache_alloc+0xb7/0x22d
        2)     4496      80   kmem_cache_alloc+0x63/0x12f
      
      The 216 size for ftrace_call includes both the ftrace_call stack
      (which includes the saving of registers it does), as well as the
      stack size of the parent.
      
      To fix this, if CC_USING_FENTRY is defined, then the stack_tracer
      will reserve the first item in stack_dump_trace[] array when
      calling save_stack_trace(), and it will fill it in with the parent ip.
      Then the code will look for the parent pointer on the stack and
      give the real size of the parent's stack pointer:
      
       # cat stack_trace
              Depth    Size   Location    (14 entries)
              -----    ----   --------
        0)     2640      48   update_group_power+0x26/0x187
        1)     2592     224   update_sd_lb_stats+0x2a5/0x4ac
        2)     2368     160   find_busiest_group+0x31/0x1f1
        3)     2208     256   load_balance+0xd9/0x662
      
      I'm Cc'ing stable, although it's not urgent, as it only shows bogus
      size for item #0, the rest of the trace is legit. It should still be
      corrected in previous stable releases.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      d4ecbfc4
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Use stack of calling function for stack tracer · 87889501
      Steven Rostedt (Red Hat) authored
      
      
      Use the stack of stack_trace_call() instead of check_stack() as
      the test pointer for max stack size. It makes it a bit cleaner
      and a little more accurate.
      
      Adding stable, as a later fix depends on this patch.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      87889501
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add function probe to trigger stack traces · dd42cd3e
      Steven Rostedt (Red Hat) authored
      
      
      Add a function probe that will cause a stack trace to be traced in
      the ring buffer when the given function(s) are called.
      
      format is:
      
       <function>:stacktrace[:<count>]
      
       echo 'schedule:stacktrace' > /debug/tracing/set_ftrace_filter
       cat /debug/tracing/trace_pipe
           kworker/2:0-4329  [002] ...2  2933.558007: <stack trace>
       => kthread
       => ret_from_fork
                <idle>-0     [000] .N.2  2933.558019: <stack trace>
       => rest_init
       => start_kernel
       => x86_64_start_reservations
       => x86_64_start_kernel
           kworker/2:0-4329  [002] ...2  2933.558109: <stack trace>
       => kthread
       => ret_from_fork
      [...]
      
      This can be set to only trace a specific amount of times:
      
       echo 'schedule:stacktrace:3' > /debug/tracing/set_ftrace_filter
       cat /debug/tracing/trace_pipe
                 <...>-58    [003] ...2   841.801694: <stack trace>
       => kthread
       => ret_from_fork
                <idle>-0     [001] .N.2   841.801697: <stack trace>
       => start_secondary
                 <...>-2059  [001] ...2   841.801736: <stack trace>
       => wait_for_common
       => wait_for_completion
       => flush_work
       => tty_flush_to_ldisc
       => input_available_p
       => n_tty_poll
       => tty_poll
       => do_select
       => core_sys_select
       => sys_select
       => system_call_fastpath
      
      To remove these:
      
       echo '!schedule:stacktrace' > /debug/tracing/set_ftrace_filter
       echo '!schedule:stacktrace:0' > /debug/tracing/set_ftrace_filter
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      dd42cd3e
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add skip argument to trace_dump_stack() · c142be8e
      Steven Rostedt (Red Hat) authored
      
      
      Altough the trace_dump_stack() already skips three functions in
      the call to stack trace, which gets the stack trace to start
      at the caller of the function, the caller may want to skip some
      more too (as it may have helper functions).
      
      Add a skip argument to the trace_dump_stack() that lets the caller
      skip back tracing functions that it doesn't care about.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c142be8e
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add function probe triggers to enable/disable events · 3cd715de
      Steven Rostedt (Red Hat) authored
      
      
      Add triggers to function tracer that lets an event get enabled or
      disabled when a function is called:
      
      format is:
      
       <function>:enable_event:<system>:<event>[:<count>]
       <function>:disable_event:<system>:<event>[:<count>]
      
       echo 'schedule:enable_event:sched:sched_switch' > /debug/tracing/set_ftrace_filter
      
      Every time schedule is called, it will enable the sched_switch event.
      
       echo 'schedule:disable_event:sched:sched_switch:2' > /debug/tracing/set_ftrace_filter
      
      The first two times schedule is called while the sched_switch
      event is enabled, it will disable it. It will not count for a time
      that the event is already disabled (or enabled for enable_event).
      
      [ fixed return without mutex_unlock() - thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3cd715de
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add a way to soft disable trace events · 417944c4
      Steven Rostedt (Red Hat) authored
      
      
      In order to let triggers enable or disable events, we need a 'soft'
      method for doing so. For example, if a function probe is added that
      lets a user enable or disable events when a function is called, that
      change must be done without taking locks or a mutex, and definitely
      it can't sleep. But the full enabling of a tracepoint is expensive.
      
      By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
      without the protection of a mutex (using set/clear_bit()), this soft
      disable flag can be used to allow critical sections to enable or disable
      events from being traced (after the event has been placed into "SOFT_MODE").
      
      Some caveats though: The comm recorder (to map pids with a comm) can not
      be soft disabled (yet). If you disable an event with with a "soft"
      disable and wait a while before reading the trace, the comm cache may be
      replaced and you'll get a bunch of <...> for comms in the trace.
      
      Reading the "enable" file for an event that is disabled will now give
      you "0*" where the '*' denotes that the tracepoint is still active but
      the event itself is "disabled".
      
      [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      417944c4
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Use manual free after synchronize_sched() not call_rcu_sched() · 7818b388
      Steven Rostedt (Red Hat) authored
      
      
      The entries to the probe hash must be freed after a synchronize_sched()
      after the entry has been removed from the hash.
      
      As the entries are registered with ops that may have their own callbacks,
      and these callbacks may sleep, we can not use call_rcu_sched() because
      the rcu callbacks registered with that are called from a softirq context.
      
      Instead of using call_rcu_sched(), manually save the entries on a free_list
      and at the end of the loop that removes the entries, do a synchronize_sched()
      and then go through the free_list, freeing the entries.
      
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7818b388
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Clean up function probe methods · e67efb93
      Steven Rostedt (Red Hat) authored
      
      
      When a function probe is created, each function that the probe is
      attached to, a "callback" method is called. On release of the probe,
      each function entry calls the "free" method.
      
      First, "callback" is a confusing name and does not really match what
      it does. Callback sounds like it will be called when the probe
      triggers. But that's not the case. This is really an "init" function,
      so lets rename it as such.
      
      Secondly, both "init" and "free" do not pass enough information back
      to the handlers. Pass back the ops, ip and data for each time the
      method is called. We have the information, might as well use it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e67efb93
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix comments for ftrace_event_file/call flags · 57d01ad0
      Steven Rostedt (Red Hat) authored
      
      
      Most of the flags for the struct ftrace_event_file were moved over
      to the flags of the struct ftrace_event_call, but the comments were
      never updated.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      57d01ad0
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add snapshot trigger to function probes · 77fd5c15
      Steven Rostedt (Red Hat) authored
      
      
       echo 'schedule:snapshot:1' > /debug/tracing/set_ftrace_filter
      
      This will cause the scheduler to trigger a snapshot the next time
      it's called (you can use any function that's not called by NMI).
      
      Even though it triggers only once, you still need to remove it with:
      
       echo '!schedule:snapshot:0' > /debug/tracing/set_ftrace_filter
      
      The :1 can be left off for the first command:
      
       echo 'schedule:snapshot' > /debug/tracing/set_ftrace_filter
      
      But this will cause all calls to schedule to trigger a snapshot.
      This must be removed without the ':0'
      
       echo '!schedule:snapshot' > /debug/tracing/set_ftrace_filter
      
      As adding a "count" is a different operation (internally).
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      77fd5c15
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add alloc/free_snapshot() to replace duplicate code · 3209cff4
      Steven Rostedt (Red Hat) authored
      
      
      Add alloc_snapshot() and free_snapshot() to allocate and free the
      snapshot buffer respectively, and use these to remove duplicate
      code.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3209cff4
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function probe to only enable needed functions · e1df4cb6
      Steven Rostedt (Red Hat) authored
      
      
      Currently the function probe enables all functions and runs a "hash"
      against every function call to see if it should call a probe. This
      is extremely wasteful.
      
      Note, a probe is something like:
      
        echo schedule:traceoff > /debug/tracing/set_ftrace_filter
      
      When schedule is called, the probe will disable tracing. But currently,
      it has a call back for *all* functions, and checks to see if the
      called function is the probe that is needed.
      
      The probe function has been created before ftrace was rewritten to
      allow for more than one "op" to be registered by the function tracer.
      When probes were created, it couldn't limit the functions without also
      limiting normal function calls. But now we can, it's about time
      to update the probe code.
      
      Todo, have separate ops for different entries. That is, assign
      a ftrace_ops per probe, instead of one op for all probes. But
      as there's not many probes assigned, this may not be that urgent.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e1df4cb6
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Separate unlimited probes from count limited probes · 8380d248
      Steven Rostedt (Red Hat) authored
      
      
      The function tracing probes that trigger traceon or traceoff can be
      set to unlimited, or given a count of # of times to execute.
      
      By separating these two types of probes, we can then use the dynamic
      ftrace function filtering directly, and remove the brute force
      "check if this function called is my probe" routines in ftrace.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8380d248
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Consolidate ftrace_trace_onoff_unreg() into callback · 8b8fa62c
      Steven Rostedt (Red Hat) authored
      
      
      The only thing ftrace_trace_onoff_unreg() does is to do a strcmp()
      against the cmd parameter to determine what op to unregister. But
      this compare is also done after the location that this function is
      called (and returns). By moving the check for '!' to unregister after
      the strcmp(), the callback function itself can just do the unregister
      and we can get rid of the helper function.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8b8fa62c
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Consolidate updating of count for traceon/off · 1c317143
      Steven Rostedt (Red Hat) authored
      
      
      Remove some duplicate code and replace it with a helper function.
      This makes the code a it cleaner.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1c317143
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Let tracing_snapshot() be used by modules but not NMI · 1b22e382
      Steven Rostedt (Red Hat) authored
      
      
      Add EXPORT_SYMBOL_GPL() to let the tracing_snapshot() functions be
      called from modules.
      
      Also add a test to see if the snapshot was called from NMI context
      and just warn in the tracing buffer if so, and return.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1b22e382
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add internal ftrace trace_puts() for ftrace to use · ca268da6
      Steven Rostedt (Red Hat) authored
      
      
      There's a few places that ftrace uses trace_printk() for internal
      use, but this requires context (normal, softirq, irq, NMI) buffers
      to keep things lockless. But the trace_puts() does not, as it can
      write the string directly into the ring buffer. Make a internal helper
      for trace_puts() and have the internal functions use that.
      
      This way the extra context buffers are not used.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ca268da6
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Optimize trace_printk() with one arg to use trace_puts() · 9d3c752c
      Steven Rostedt (Red Hat) authored
      
      
      Although trace_printk() is extremely fast, especially when it uses
      trace_bprintk() (writes args straight to buffer instead of inserting
      into string), it still has the overhead of calling one of the printf
      sprintf() functions, that need to scan the fmt string to determine
      what, if any args it has.
      
      This is a waste of precious CPU cycles if the printk format has no
      args but a single constant string. It is better to use trace_puts()
      which does not have the overhead of the fmt scanning.
      
      But wouldn't it be nice if the developer didn't have to think about
      such things, and the compile would just do it for them?
      
        trace_printk("this string has no args\n");
        [...]
        trace_printk("this sting does %p %d\n", foo, bar);
      
      As tracing is critical to have the least amount of overhead,
      especially when dealing with race conditions, and you want to
      eliminate any "Heisenbugs", you want the trace_printk() to use the
      fastest possible means of tracing.
      
      Currently the macro magic determines if it will use trace_bprintk()
      or if the fmt is a dynamic string (a variable), it will fall
      back to the slow trace_printk() method that does a full snprintf()
      before copying it into the buffer, where as trace_bprintk() only
      copys the pointer to the fmt and the args into the buffer.
      
      Well, now there's a way to spend some more Hogwarts cash and come
      up with new fancy macro magic.
      
        #define trace_printk(fmt, ...)			\
        do {							\
      	char _______STR[] = __stringify((__VA_ARGS__));	\
      	if (sizeof(_______STR) > 3)			\
      		do_trace_printk(fmt, ##__VA_ARGS__);	\
      	else						\
      		trace_puts(fmt);			\
        } while (0)
      
      The above needs a bit of explaining (both here and in the comments).
      
      By stringifying the __VA_ARGS__, we can, at compile time, determine
      the number of args that are being passed to trace_printk(). The extra
      parenthesis are required, otherwise the compiler complains about
      too many parameters for __stringify if there is more than one arg.
      
      When there are no args, the __stringify((__VA_ARGS__)) converts into
      "()\0", a string of 3 characters. Anything else, will be a string
      containing more than 3 characters. Now we assign that string to a
      dynamic char array, and then take the sizeof() of that array.
      If it is greater than 3 characters, we know trace_printk() has args
      and we need to do the full "do_trace_printk()" on them, otherwise
      it was only passed a single arg and we can optimize to use trace_puts().
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven "The King of Nasty Macros!" Rostedt <rostedt@goodmis.org>
      9d3c752c
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add trace_puts() for even faster trace_printk() tracing · 09ae7234
      Steven Rostedt (Red Hat) authored
      
      
      The trace_printk() is extremely fast and is very handy as it can be
      used in any context (including NMIs!). But it still requires scanning
      the fmt string for parsing the args. Even the trace_bprintk() requires
      a scan to know what args will be saved, although it doesn't copy the
      format string itself.
      
      Several times trace_printk() has no args, and wastes cpu cycles scanning
      the fmt string.
      
      Adding trace_puts() allows the developer to use an even faster
      tracing method that only saves the pointer to the string in the
      ring buffer without doing any format parsing at all. This will
      help remove even more of the "Heisenbug" effect, when debugging.
      
      Also fixed up the F_printk()s for the ftrace internal bprint and print events.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      09ae7234
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix the branch tracer that broke with buffer change · 153e8ed9
      Steven Rostedt (Red Hat) authored
      
      
      The changce to add the trace_buffer struct to have the trace array
      have both the main buffer and max buffer broke the branch tracer
      because the change did not update that code. As the branch tracer
      adds a significant amount of overhead, and must be selected via
      a selection (not a allyesconfig) it was missed in testing.
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      153e8ed9
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add alloc_snapshot kernel command line parameter · 55034cd6
      Steven Rostedt (Red Hat) authored
      
      
      If debugging the kernel, and the developer wants to use
      tracing_snapshot() in places where tracing_snapshot_alloc() may
      be difficult (or more likely, the developer is lazy and doesn't
      want to bother with tracing_snapshot_alloc() at all), then adding
      
        alloc_snapshot
      
      to the kernel command line parameter will tell ftrace to allocate
      the snapshot buffer (if configured) when it allocates the main
      tracing buffer.
      
      I also noticed that ring_buffer_expanded and tracing_selftest_disabled
      had inconsistent use of boolean "true" and "false" with "0" and "1".
      I cleaned that up too.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      55034cd6
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Move the tracing selftest code into its own function · f4e781c0
      Steven Rostedt (Red Hat) authored
      
      
      Move the tracing startup selftest code into its own function and
      when not enabled, always have that function succeed.
      
      This makes the register_tracer() function much more readable.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f4e781c0
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Do not use schedule_work_on() for current CPU · f5eb5588
      Steven Rostedt (Red Hat) authored
      
      
      The ring buffer updates when done while the ring buffer is active,
      needs to be completed on the CPU that is used for the ring buffer
      per_cpu buffer. To accomplish this, schedule_work_on() is used to
      schedule work on the given CPU.
      
      Now there's no reason to use schedule_work_on() if the process
      doing the update happens to be on the CPU that it is processing.
      It has already filled the requirement. Instead, just do the work
      and continue.
      
      This is needed for tracing_snapshot_alloc() where it may be called
      really early in boot, where the work queues have not been set up yet.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f5eb5588
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add internal tracing_snapshot() functions · ad909e21
      Steven Rostedt (Red Hat) authored
      
      
      The new snapshot feature is quite handy. It's a way for the user
      to take advantage of the spare buffer that, until then, only
      the latency tracers used to "snapshot" the buffer when it hit
      a max latency. Now users can trigger a "snapshot" manually when
      some condition is hit in a program. But a snapshot currently can
      not be triggered by a condition inside the kernel.
      
      With the addition of tracing_snapshot() and tracing_snapshot_alloc(),
      snapshots can now be taking when a condition is hit, and the
      developer wants to snapshot the case without stopping the trace.
      
      Note, any snapshot will overwrite the old one, so take care
      in how this is done.
      
      These new functions are to be used like tracing_on(), tracing_off()
      and trace_printk() are. That is, they should never be called
      in the mainline Linux kernel. They are solely for the purpose
      of debugging.
      
      The tracing_snapshot() will not allocate a buffer, but it is
      safe to be called from any context (except NMIs). But if a
      snapshot buffer isn't allocated when it is called, it will write
      to the live buffer, complaining about the lack of a snapshot
      buffer, and then stop tracing (giving you the "permanent snapshot").
      
      tracing_snapshot_alloc() will allocate the snapshot buffer if
      it was not already allocated and then take the snapshot. This routine
      *may sleep*, and must be called from context that can sleep.
      The allocation is done with GFP_KERNEL and not atomic.
      
      If you need a snapshot in an atomic context, say in early boot,
      then it is best to call the tracing_snapshot_alloc() before then,
      where it will allocate the buffer, and then you can use the
      tracing_snapshot() anywhere you want and still get snapshots.
      
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ad909e21
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Prevent deleting instances when they are being read · a695cb58
      Steven Rostedt (Red Hat) authored
      
      
      Add a ref count to the trace_array structure and prevent removal
      of instances that have open descriptors.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a695cb58
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add per_cpu directory into tracing instances · 121aaee7
      Steven Rostedt (Red Hat) authored
      
      
      Add the per_cpu directory to the created tracing instances:
      
        cd /sys/kernel/debug/tracing/instances
        mkdir foo
        ls foo/per_cpu/cpu0
      buffer_size_kb	snapshot_raw  trace	  trace_pipe_raw
      snapshot	stats	      trace_pipe
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      121aaee7
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add snapshot feature to instances · ce9bae55
      Steven Rostedt (Red Hat) authored
      
      
      Add the "snapshot" file to the the multi-buffer instances.
      
        cd /sys/kernel/debug/tracing/instances
        mkdir foo
        ls foo
      buffer_size_kb  buffer_total_size_kb  events  free_buffer  set_event
      snapshot  trace  trace_clock  trace_marker  trace_options  trace_pipe
      tracing_on
        cat foo/snapshot
       # tracer: nop
       #
       #
       # * Snapshot is freed *
       #
       # Snapshot commands:
       # echo 0 > snapshot : Clears and frees snapshot buffer
       # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
       #                      Takes a snapshot of the main buffer.
       # echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
       #                      (Doesn't have to be '2' works with any number that
       #                       is not a '0' or '1')
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ce9bae55
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Consolidate buffer allocation code · 737223fb
      Steven Rostedt (Red Hat) authored
      
      
      There's a bit of duplicate code in creating the trace buffers for
      the normal trace buffer and the max trace buffer among the instances
      and the main global_trace. This code can be consolidated and cleaned
      up a bit making the code cleaner and more readable as well as less
      duplication.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      737223fb
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Have trace_array keep track if snapshot buffer is allocated · 45ad21ca
      Steven Rostedt (Red Hat) authored
      
      
      The snapshot buffer belongs to the trace array not the tracer that is
      running. The trace array should be the data structure that keeps track
      of whether or not the snapshot buffer is allocated, not the tracer
      desciptor. Having the trace array keep track of it makes modifications
      so much easier.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      45ad21ca
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add snapshot_raw to extract the raw data from snapshot · 6de58e62
      Steven Rostedt (Red Hat) authored
      
      
      Add a 'snapshot_raw' per_cpu file that allows tools to read the raw
      binary data of the snapshot buffer.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6de58e62
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add config option to allow snapshot to swap per cpu · 0b85ffc2
      Steven Rostedt (Red Hat) authored
      
      
      When the preempt or irq latency tracers are enabled, they require
      the ring buffer to be able to swap the per cpu sub buffers between
      two main buffers. This adds a slight overhead to tracing as the
      trace recording needs to perform some checks to synchronize
      between recording and swaps that might be happening on other CPUs.
      
      The config RING_BUFFER_ALLOW_SWAP is set when a user of the ring
      buffer needs the "swap cpu" feature, otherwise the extra checks
      are not implemented and removed from the tracing overhead.
      
      The snapshot feature will swap per CPU if the RING_BUFFER_ALLOW_SWAP
      config is set. But that only gets set by things like OPROFILE
      and the irqs and preempt latency tracers.
      
      This config is added to let the user decide to include this feature
      with the snapshot agnostic from whether or not another user of
      the ring buffer sets this config.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0b85ffc2
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add snapshot in the per_cpu trace directories · f1affcaa
      Steven Rostedt (Red Hat) authored
      
      
      Add the snapshot file into the per_cpu tracing directories to allow
      them to be read for an individual cpu. This also allows to clear
      an individual cpu from the snapshot buffer.
      
      If the kernel allows it (CONFIG_RING_BUFFER_ALLOW_SWAP is set), then
      echoing in '1' into one of the per_cpu snapshot files will do an
      individual cpu buffer swap instead of the entire file.
      
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f1affcaa
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) authored
      
      
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      12883efb
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Enable snapshot when any latency tracer is enabled · 22cffc2b
      Steven Rostedt (Red Hat) authored
      
      
      The snapshot utility is extremely useful, and does not add any more
      overhead in memory when another latency tracer is enabled. They use
      the snapshot underneath. There's no reason to hide the snapshot file
      when a latency tracer has been enabled in the kernel.
      
      If any of the latency tracers (irq, preempt or wakeup) is enabled
      then also select the snapshot facility.
      
      Note, snapshot can be enabled without the latency tracers enabled.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      22cffc2b
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Clear all trace buffers when unloaded module event was used · 873c642f
      Steven Rostedt (Red Hat) authored
      
      
      Currently we do not know what buffer a module event was enabled in.
      On unload, it is safest to clear all buffer instances, not just the
      top level buffer.
      
      Todo: Clear only the buffer that the event was used in. The
      infrastructure is there to do this, but it makes the code a bit
      more complex. Lets get the current code vetted before we add that.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      873c642f
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Only clear trace buffer on module unload if event was traced · 575380da
      Steven Rostedt (Red Hat) authored
      
      
      Currently, when a module with events is unloaded, the trace buffer is
      cleared. This is just a safety net in case the module might have some
      strange callback when its event is outputted. But there's no reason
      to reset the buffer if the module didn't have any of its events traced.
      
      Add a flag to the event "call" structure called WAS_ENABLED and gets set
      when the event is ever enabled, and this flag never gets cleared. When a
      module gets unloaded, if any of its events have this flag set, then the
      trace buffer will get cleared.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      575380da
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add comment for trace event flag IGNORE_ENABLE · 2a30c11f
      Steven Rostedt (Red Hat) authored
      
      
      All the trace event flags have comments but the IGNORE_ENABLE flag
      which is set for ftrace internal events that should not be enabled
      via the debugfs "enable" file. That is, if the top level enable file
      is set, it will enable all events. It use to just check the ftrace
      event call descriptor "reg" field and skip those whithout it, but now
      some ftrace internal events have a reg field but still need to be
      skipped. The flag was created to ignore those events.
      
      Now document it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2a30c11f
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Init waitqueue for blocked readers · f1dc6725
      Steven Rostedt (Red Hat) authored
      
      
      The move of blocked readers to the ring buffer left out the
      init of the wait queue that is used. Tests missed this due to running
      stress tests against the buffers, which didn't allow for any
      readers to end up waiting. Running a simple read and wait triggered
      a bug.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f1dc6725