1. 02 Mar, 2009 8 commits
    • Uwe Kleine-Koenig's avatar
      tracing: make CALLER_ADDRx overwriteable · c79a61f5
      Uwe Kleine-Koenig authored
      
      
      The current definition of CALLER_ADDRx isn't suitable for all platforms.
      E.g. for ARM __builtin_return_address(N) doesn't work for N > 0 and
      AFAIK for powerpc there are no frame pointers needed to have a working
      __builtin_return_address.  This patch allows defining the CALLER_ADDRx
      macros in <asm/ftrace.h> and let these take precedence.
      
      Because now <asm/ftrace.h> is included unconditionally in
      <linux/ftrace.h> all archs that don't already had this include get an
      empty one for free.
      Signed-off-by: default avatarUwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Reviewed-by: default avatarKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      c79a61f5
    • Steven Rostedt's avatar
      tracing: add print format to event trace format files · 96ccd21c
      Steven Rostedt authored
      
      
      This patch adds the internal print format used to print the raw events
      to the event trace point format file.
      
       # cat /debug/tracing/events/sched/sched_switch/format
      name: sched_switch
      ID: 29
      format:
              field:unsigned char type;       offset:0;       size:1;
              field:unsigned char flags;      offset:1;       size:1;
              field:unsigned char preempt_count;      offset:2;       size:1;
              field:int pid;  offset:4;       size:4;
              field:int tgid; offset:8;       size:4;
      
              field:pid_t prev_pid;   offset:12;      size:4;
              field:int prev_prio;    offset:16;      size:4;
              field special:char next_comm[TASK_COMM_LEN];    offset:20;      size:16;
              field:pid_t next_pid;   offset:36;      size:4;
              field:int next_prio;    offset:40;      size:4;
      
      print fmt: "prev %d:%d ==> next %s:%d:%d"
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      96ccd21c
    • Steven Rostedt's avatar
      tracing: add trace name and id to event formats · c5e4e192
      Steven Rostedt authored
      
      
      To be able to identify the trace in the binary format output, the
      id of the trace event (which is dynamically assigned) must also be listed.
      
      This patch adds the name of the trace point as well as the id assigned.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      c5e4e192
    • Steven Rostedt's avatar
      tracing: add ftrace headers to event format files · 91729ef9
      Steven Rostedt authored
      
      
      This patch includes the ftrace header to the event formats files:
      
       # cat /debug/tracing/events/sched/sched_switch/format
              field:unsigned char type;       offset:0;       size:1;
              field:unsigned char flags;      offset:1;       size:1;
              field:unsigned char preempt_count;      offset:2;       size:1;
              field:int pid;  offset:4;       size:4;
              field:int tgid; offset:8;       size:4;
      
              field:pid_t prev_pid;   offset:12;      size:4;
              field:int prev_prio;    offset:16;      size:4;
              field special:char next_comm[TASK_COMM_LEN];    offset:20;      size:16;
              field:pid_t next_pid;   offset:36;      size:4;
              field:int next_prio;    offset:40;      size:4;
      
      A blank line is used as a deliminator between the ftrace header and the
      trace point fields.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      91729ef9
    • Steven Rostedt's avatar
      tracing: add format file to describe event struct fields · 981d081e
      Steven Rostedt authored
      
      
      This patch adds the "format" file to the trace point event directory.
      This is based off of work by Tom Zanussi, in which a file is exported
      to be tread from user land such that a user space app may read the
      binary record stored in the ring buffer.
      
       # cat /debug/tracing/events/sched/sched_switch/format
              field:pid_t prev_pid;   offset:12;      size:4;
              field:int prev_prio;    offset:16;      size:4;
              field special:char next_comm[TASK_COMM_LEN];    offset:20;      size:16;
              field:pid_t next_pid;   offset:36;      size:4;
              field:int next_prio;    offset:40;      size:4;
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      981d081e
    • Steven Rostedt's avatar
      tracing: make trace_seq_reset global and rename to trace_seq_init · f9520750
      Steven Rostedt authored
      
      
      Impact: clean up
      
      The trace_seq functions may be used separately outside of the ftrace
      iterator. The trace_seq_reset is needed for these operations.
      
      This patch also renames trace_seq_reset to the more appropriate
      trace_seq_init.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      f9520750
    • Steven Rostedt's avatar
      tracing: add protection around modify trace event fields · 11a241a3
      Steven Rostedt authored
      
      
      The trace event objects are currently not proctected against
      reentrancy. This patch adds a mutex around the modifications of
      the trace event fields.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      11a241a3
    • Steven Rostedt's avatar
      tracing: add TRACE_FIELD_SPECIAL to record complex entries · d20e3b03
      Steven Rostedt authored
      
      
      Tom Zanussi pointed out that the simple TRACE_FIELD was not enough to
      record trace data that required memcpy. This patch addresses this issue
      by adding a TRACE_FIELD_SPECIAL. The format is similar to TRACE_FIELD
      but looks like so:
      
        TRACE_FIELD_SPECIAL(type_item, item, cmd)
      
      What TRACE_FIELD gave was:
      
        TRACE_FIELD(type, item, assign)
      
      The TRACE_FIELD would be used in declaring a structure:
      
        struct {
      	type	item;
        };
      
      And later assign it via:
      
        entry->item = assign;
      
      What TRACE_FIELD_SPECIAL gives us is:
      
      In the declaration of the structure:
      
        struct {
      	type_item;
        };
      
      And the assignment:
      
        cmd;
      
      This change log will explain the one example used in the patch:
      
       TRACE_EVENT_FORMAT(sched_switch,
      	TPPROTO(struct rq *rq, struct task_struct *prev,
      		struct task_struct *next),
      	TPARGS(rq, prev, next),
      	TPFMT("task %s:%d ==> %s:%d",
      	      prev->comm, prev->pid, next->comm, next->pid),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, prev_pid, prev->pid)
      		TRACE_FIELD(int, prev_prio, prev->prio)
      		TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
      				    next_comm,
      				    TPCMD(memcpy(TRACE_ENTRY->next_comm,
      						 next->comm,
      						 TASK_COMM_LEN)))
      		TRACE_FIELD(pid_t, next_pid, next->pid)
      		TRACE_FIELD(int, next_prio, next->prio)
      	),
      	TPRAWFMT("prev %d:%d ==> next %s:%d:%d")
      	);
      
       The struct will be create as:
      
        struct {
      	pid_t		prev_pid;
      	int		prev_prio;
      	char next_comm[TASK_COMM_LEN];
      	pid_t		next_pid;
      	int		next_prio;
        };
      
      Note the TRACE_ENTRY in the cmd part of TRACE_SPECIAL. TRACE_ENTRY will
      be set by the tracer to point to the structure inside the trace buffer.
      
        entry->prev_pid	= prev->pid;
        entry->prev_prio	= prev->prio;
        memcpy(entry->next_comm, next->comm, TASK_COMM_LEN);
        entry->next_pid	= next->pid;
        entry->next_prio	= next->prio
      Reported-by: default avatarTom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      d20e3b03
  2. 28 Feb, 2009 10 commits
    • Steven Rostedt's avatar
      tracing: create the C style tracing for the irq subsystem · f2034f1e
      Steven Rostedt authored
      
      
      This patch utilizes the TRACE_EVENT_FORMAT macro to enable the C style
      faster tracing for the irq subsystem trace points.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      f2034f1e
    • Steven Rostedt's avatar
      tracing: create the C style tracing for the sched subsystem · 62992804
      Steven Rostedt authored
      
      
      This patch utilizes the TRACE_EVENT_FORMAT macro to enable the C style
      faster tracing for the sched subsystem trace points.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      62992804
    • Steven Rostedt's avatar
      tracing: add raw fast tracing interface for trace events · fd994989
      Steven Rostedt authored
      
      
      This patch adds the interface to enable the C style trace points.
      In the directory /debugfs/tracing/events/subsystem/event
      We now have three files:
      
       enable : values 0 or 1 to enable or disable the trace event.
      
       available_types: values 'raw' and 'printf' which indicate the tracing
             types available for the trace point. If a developer does not
             use the TRACE_EVENT_FORMAT macro and just uses the TRACE_FORMAT
             macro, then only 'printf' will be available. This file is
             read only.
      
       type: values 'raw' or 'printf'. This indicates which type of tracing
             is active for that trace point. 'printf' is the default and
             if 'raw' is not available, this file is read only.
      
       # echo raw > /debug/tracing/events/sched/sched_wakeup/type
       # echo 1 > /debug/tracing/events/sched/sched_wakeup/enable
      
       Will enable the C style tracing for the sched_wakeup trace point.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      fd994989
    • Steven Rostedt's avatar
      tracing: add raw trace point recording infrastructure · c32e827b
      Steven Rostedt authored
      
      
      Impact: lower overhead tracing
      
      The current event tracer can automatically pick up trace points
      that are registered with the TRACE_FORMAT macro. But it required
      a printf format string and parsing. Although, this adds the ability
      to get guaranteed information like task names and such, it took
      a hit in overhead processing. This processing can add about 500-1000
      nanoseconds overhead, but in some cases that too is considered
      too much and we want to shave off as much from this overhead as
      possible.
      
      Tom Zanussi recently posted tracing patches to lkml that are based
      on a nice idea about capturing the data via C structs using
      STRUCT_ENTER, STRUCT_EXIT type of macros.
      
      I liked that method very much, but did not like the implementation
      that required a developer to add data/code in several disjoint
      locations.
      
      This patch extends the event_tracer macros to do a similar "raw C"
      approach that Tom Zanussi did. But instead of having the developers
      needing to tweak a bunch of code all over the place, they can do it
      all in one macro - preferably placed near the code that it is
      tracing. That makes it much more likely that tracepoints will be
      maintained on an ongoing basis by the code they modify.
      
      The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
      a developer may still utilize the more low level DECLARE_TRACE macros
      if they don't care about getting their traces automatically in the event
      tracer.)
      
      They can also use the existing TRACE_FORMAT if they don't need to code
      the tracepoint in C, but just want to use the convenience of printf.
      
      So if the developer wants to "hardwire" a tracepoint in the fastest
      possible way, and wants to acquire their data via a user space utility
      in a raw binary format, or wants to see it in the trace output but not
      sacrifice any performance, then they can implement the faster but
      more complex TRACE_EVENT_FORMAT macro.
      
      Here's what usage looks like:
      
        TRACE_EVENT_FORMAT(name,
      	TPPROTO(proto),
      	TPARGS(args),
      	TPFMT(fmt, fmt_args),
      	TRACE_STUCT(
      		TRACE_FIELD(type1, item1, assign1)
      		TRACE_FIELD(type2, item2, assign2)
      			[...]
      	),
      	TPRAWFMT(raw_fmt)
      	);
      
      Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
      uses.
      
       name: is the unique identifier of the trace point
       proto: The proto type that the trace point uses
       args: the args in the proto type
       fmt: printf format to use with the event printf tracer
       fmt_args: the printf argments to match fmt
      
       TRACE_STRUCT starts the ability to create a structure.
       Each item in the structure is defined with a TRACE_FIELD
      
        TRACE_FIELD(type, item, assign)
      
       type: the C type of item.
       item: the name of the item in the stucture
       assign: what to assign the item in the trace point callback
      
       raw_fmt is a way to pretty print the struct. It must match
        the order of the items are added in TRACE_STUCT
      
       An example of this would be:
      
       TRACE_EVENT_FORMAT(sched_wakeup,
      	TPPROTO(struct rq *rq, struct task_struct *p, int success),
      	TPARGS(rq, p, success),
      	TPFMT("task %s:%d %s",
      	      p->comm, p->pid, success?"succeeded":"failed"),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, pid, p->pid)
      		TRACE_FIELD(int, success, success)
      	),
      	TPRAWFMT("task %d success=%d")
      	);
      
       This creates us a unique struct of:
      
       struct {
      	pid_t		pid;
      	int		success;
       };
      
       And the way the call back would assign these values would be:
      
      	entry->pid = p->pid;
      	entry->success = success;
      
      The nice part about this is that the creation of the assignent is done
      via macro magic in the event tracer.  Once the TRACE_EVENT_FORMAT is
      created, the developer will then have a faster method to record
      into the ring buffer. They do not need to worry about the tracer itself.
      
      The developer would only need to touch the files in include/trace/*.h
      
      Again, I would like to give special thanks to Tom Zanussi for this
      nice idea.
      
      Idea-from: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      c32e827b
    • Steven Rostedt's avatar
      tracing: add interface to write into current tracer buffer · ef5580d0
      Steven Rostedt authored
      
      
      Right now all tracers must manage their own trace buffers. This was
      to enforce tracers to be independent in case we finally decide to
      allow each tracer to have their own trace buffer.
      
      But now we are adding event tracing that writes to the current tracer's
      buffer. This adds an interface to allow events to write to the current
      tracer buffer without having to manage its own. Since event tracing
      has no "tracer", and is just a way to hook into any other tracer.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      ef5580d0
    • Steven Rostedt's avatar
      tracing: add subsystem sched for sched events · 3d7ba938
      Steven Rostedt authored
      
      
      Add the TRACE_SYSTEM sched for the sched events.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      3d7ba938
    • Steven Rostedt's avatar
      tracing: add subsystem irq for irq events · 0ec2ef15
      Steven Rostedt authored
      
      
      Add the TRACE_SYSTEM irq for the irq events.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      0ec2ef15
    • Steven Rostedt's avatar
      tracing: make the set_event and available_events subsystem aware · b628b3e6
      Steven Rostedt authored
      
      
      This patch makes the event files, set_event and available_events
      aware of the subsystem.
      
      Now you can enable an entire subsystem with:
      
        echo 'irq:*' > set_event
      
      Note: the '*' is not needed.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      b628b3e6
    • Steven Rostedt's avatar
      tracing: add subsystem level to trace events · 6ecc2d1c
      Steven Rostedt authored
      
      
      If a trace point header defines TRACE_SYSTEM, then it will add the
      following trace points into that event system.
      
      If include/trace/irq_event_types.h has:
      
       #define TRACE_SYSTEM irq
      
      at the top and
      
       #undef TRACE_SYSTEM
      
      at the bottom, then a directory "irq" will be created in the
      /debug/tracing/events directory. Inside that directory will contain the
      two trace points that are defined in include/trace/irq_event_types.h.
      
      Only adding the above to irq and not to sched, we get:
      
       # ls /debug/tracing/events/
      irq                     sched_process_exit  sched_signal_send  sched_wakeup_new
      sched_kthread_stop      sched_process_fork  sched_switch
      sched_kthread_stop_ret  sched_process_free  sched_wait_task
      sched_migrate_task      sched_process_wait  sched_wakeup
      
       # ls /debug/tracing/events/irq
      irq_handler_entry  irq_handler_exit
      
      If we add #define TRACE_SYSTEM sched to the trace/sched_event_types.h
      then the rest of the trace events will be put in a sched directory
      within the events directory.
      
      I've been playing with this idea of the subsystem for a while, but
      recently Tom Zanussi posted some patches to lkml that included this
      method. Tom's approach was clean and got me to finally put some effort
      to clean up the event trace points.
      
      Thanks to Tom Zanussi for demonstrating how nice the subsystem
      method is.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      6ecc2d1c
    • Steven Rostedt's avatar
      tracing: move trace point formats to files in include/trace directory · eb594e45
      Steven Rostedt authored
      
      
      Impact: clean up
      
      To further facilitate the ease of adding trace points for developers, this
      patch creates include/trace/trace_events.h and
      include/trace/trace_event_types.h.
      
      The former file will hold the trace/<type>.h files and the latter will hold
      the trace/<type>_event_types.h files.
      
      To create new tracepoints and to have them automatically
      appear in the event tracer, a developer makes the trace/<type>.h file
      which includes <linux/tracepoint.h> and the trace/<type>_event_types.h file.
      
      The trace/<type>_event_types.h file will hold the TRACE_FORMAT
      macros.
      
      Then add the trace/<type>.h file to trace/trace_events.h,
      and add the trace/<type>_event_types.h to the trace_event_types.h file.
      
      No need to modify files elsewhere.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      eb594e45
  3. 27 Feb, 2009 6 commits
  4. 26 Feb, 2009 11 commits
  5. 25 Feb, 2009 5 commits
    • Frederic Weisbecker's avatar
      tracing/core: make the read callbacks reentrants · d7350c3f
      Frederic Weisbecker authored
      
      
      Now that several per-cpu files can be read or spliced at the
      same, we want the read/splice callbacks for tracing files to be
      reentrants.
      
      Until now, a single global mutex (trace_types_lock) serialized
      the access to tracing_read_pipe(), tracing_splice_read_pipe(),
      and the seq helpers.
      
      Ie: it means that if a user tries to read trace_pipe0 and
      trace_pipe1 at the same time, the access to the function
      tracing_read_pipe() is contended and one reader must wait for
      the other to finish its read call.
      
      The trace_type_lock mutex is mostly here to serialize the access
      to the global current tracer (current_trace), which can be
      changed concurrently. Although the iter struct keeps a private
      pointer to this tracer, its callbacks can be changed by another
      function.
      
      The method used here is to not keep anymore private reference to
      the tracer inside the iterator but to make a copy of it inside
      the iterator. Then it checks on subsequents read calls if the
      tracer has changed. This is not costly because the current
      tracer is not expected to be changed often, so we use a branch
      prediction for that.
      
      Moreover, we add a private mutex to the iterator (there is one
      iterator per file descriptor) to serialize the accesses in case
      of multiple consumers per file descriptor (which would be a
      silly idea from the user). Note that this is not to protect the
      ring buffer, since the ring buffer already serializes the
      readers accesses. This is to prevent from traces weirdness in
      case of concurrent consumers. But these mutexes can be dropped
      anyway, that would not result in any crash. Just tell me what
      you think about it.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d7350c3f
    • Frederic Weisbecker's avatar
      tracing/core: introduce per cpu tracing files · b04cc6b1
      Frederic Weisbecker authored
      
      
      Impact: split up tracing output per cpu
      
      Currently, on the tracing debugfs directory, three files are
      available to the user to let him extracting the trace output:
      
      - trace is an iterator through the ring-buffer. It's a reader
        but not a consumer It doesn't block when no more traces are
        available.
      
      - trace pretty similar to the former, except that it adds more
        informations such as prempt count, irq flag, ...
      
      - trace_pipe is a reader and a consumer, it will also block
        waiting for traces if necessary (heh, yes it's a pipe).
      
      The traces coming from different cpus are curretly mixed up
      inside these files. Sometimes it messes up the informations,
      sometimes it's useful, depending on what does the tracer
      capture.
      
      The tracing_cpumask file is useful to filter the output and
      select only the traces captured a custom defined set of cpus.
      But still it is not enough powerful to extract at the same time
      one trace buffer per cpu.
      
      So this patch creates a new directory: /debug/tracing/per_cpu/.
      
      Inside this directory, you will now find one trace_pipe file and
      one trace file per cpu.
      
      Which means if you have two cpus, you will have:
      
       trace0
       trace1
       trace_pipe0
       trace_pipe1
      
      And of course, reading these files will have the same effect
      than with the usual tracing files, except that you will only see
      the traces from the given cpu.
      
      The original all-in-one cpu trace file are still available on
      their original place.
      
      Until now, only one consumer was allowed on trace_pipe to avoid
      racy consuming on the ring-buffer. Now the approach changed a
      bit, you can have only one consumer per cpu.
      
      Which means you are allowed to read concurrently trace_pipe0 and
      trace_pipe1 But you can't have two readers on trace_pipe0 or
      trace_pipe1.
      
      Following the same logic, if there is one reader on the common
      trace_pipe, you can not have at the same time another reader on
      trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
      a consumer in all cpu buffers in essence.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b04cc6b1
    • Ingo Molnar's avatar
      Merge branch 'tip/tracing/ftrace' of... · 2b1b858f
      Ingo Molnar authored
      Merge branch 'tip/tracing/ftrace' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/ftrace
      2b1b858f
    • Ingo Molnar's avatar
      tracing: remove /debug/tracing/latency_trace · 886b5b73
      Ingo Molnar authored
      
      
      Impact: remove old debug/tracing API
      
      /debug/tracing/latency_trace is an old legacy format we kept from
      the old latency tracer. Remove the file for now. If there's any
      useful bit missing then we'll propagate any useful output bits into
      the /debug/tracing/trace output.
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      886b5b73
    • Ingo Molnar's avatar
      tracing/hw-branch-tracing: convert bts-tracer mutex to a spinlock · 2d542cf3
      Ingo Molnar authored
      
      
      Impact: fix CPU hotplug lockup
      
      bts_hotcpu_handler() is called with irqs disabled, so using mutex_lock()
      is a no-no.
      
      All the BTS codepaths here are atomic (they do not schedule), so using
      a spinlock is the right solution.
      
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2d542cf3