1. 07 Jul, 2011 2 commits
    • Steven Rostedt's avatar
      tracing: Have "enable" file use refcounts like the "filter" file · 40ee4dff
      Steven Rostedt authored
      
      
      The "enable" file for the event system can be removed when a module
      is unloaded and the event system only has events from that module.
      As the event system nr_events count goes to zero, it may be freed
      if its ref_count is also set to zero.
      
      Like the "filter" file, the "enable" file may be opened by a task and
      referenced later, after a module has been unloaded and the events for
      that event system have been removed.
      
      Although the "filter" file referenced the event system structure,
      the "enable" file only references a pointer to the event system
      name. Since the name is freed when the event system is removed,
      it is possible that an access to the "enable" file may reference
      a freed pointer.
      
      Update the "enable" file to use the subsystem_open() routine that
      the "filter" file uses, to keep a reference to the event system
      structure while the "enable" file is opened.
      
      Cc: <stable@kernel.org>
      Reported-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      40ee4dff
    • Steven Rostedt's avatar
      tracing: Fix bug when reading system filters on module removal · e9dbfae5
      Steven Rostedt authored
      
      
      The event system is freed when its nr_events is set to zero. This happens
      when a module created an event system and then later the module is
      removed. Modules may share systems, so the system is allocated when
      it is created and freed when the modules are unloaded and all the
      events under the system are removed (nr_events set to zero).
      
      The problem arises when a task opened the "filter" file for the
      system. If the module is unloaded and it removed the last event for
      that system, the system structure is freed. If the task that opened
      the filter file accesses the "filter" file after the system has
      been freed, the system will access an invalid pointer.
      
      By adding a ref_count, and using it to keep track of what
      is using the event system, we can free it after all users
      are finished with the event system.
      
      Cc: <stable@kernel.org>
      Reported-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e9dbfae5
  2. 26 May, 2011 1 commit
  3. 06 May, 2011 1 commit
  4. 10 Mar, 2011 2 commits
    • Yuanhan Liu's avatar
      tracing: Export trace_set_clr_event() · 56355b83
      Yuanhan Liu authored
      
      
      Trace events belonging to a module only exists when the module is
      loaded. Well, we can use trace_set_clr_event funtion to enable some
      trace event at the module init routine, so that we will not miss
      something while loading then module.
      
      So, Export the trace_set_clr_event function so that module can use it.
      Signed-off-by: default avatarYuanhan Liu <yuanhan.liu@linux.intel.com>
      LKML-Reference: <1289196312-25323-1-git-send-email-yuanhan.liu@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      56355b83
    • Steven Rostedt's avatar
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt authored
      
      
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
  5. 03 Feb, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Replace trace_event struct array with pointer array · e4a9ea5e
      Steven Rostedt authored
      
      
      Currently the trace_event structures are placed in the _ftrace_events
      section, and at link time, the linker makes one large array of all
      the trace_event structures. On boot up, this array is read (much like
      the initcall sections) and the events are processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the _ftrace_event section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The _ftrace_event section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e4a9ea5e
  6. 19 Nov, 2010 1 commit
    • Steven Rostedt's avatar
      tracing/events: Show real number in array fields · 04295780
      Steven Rostedt authored
      
      
      Currently we have in something like the sched_switch event:
      
        field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;	signed:1;
      
      When a userspace tool such as perf tries to parse this, the
      TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
      simply uses a #len to show the string of the length. When the length is
      an enum, we get a string that means nothing for tools.
      
      By adding a static buffer and a mutex to protect it, we can store the
      string into that buffer with snprintf and show the actual number.
      Now we get:
      
        field:char prev_comm[16];       offset:12;      size:16;        signed:1;
      
      Something much more useful.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      04295780
  7. 15 Oct, 2010 1 commit
    • Arnd Bergmann's avatar
      llseek: automatically add .llseek fop · 6038f373
      Arnd Bergmann authored
      
      
      All file_operations should get a .llseek operation so we can make
      nonseekable_open the default for future file operations without a
      .llseek pointer.
      
      The three cases that we can automatically detect are no_llseek, seq_lseek
      and default_llseek. For cases where we can we can automatically prove that
      the file offset is always ignored, we use noop_llseek, which maintains
      the current behavior of not returning an error from a seek.
      
      New drivers should normally not use noop_llseek but instead use no_llseek
      and call nonseekable_open at open time.  Existing drivers can be converted
      to do the same when the maintainer knows for certain that no user code
      relies on calling seek on the device file.
      
      The generated code is often incorrectly indented and right now contains
      comments that clarify for each added line why a specific variant was
      chosen. In the version that gets submitted upstream, the comments will
      be gone and I will manually fix the indentation, because there does not
      seem to be a way to do that using coccinelle.
      
      Some amount of new code is currently sitting in linux-next that should get
      the same modifications, which I will do at the end of the merge window.
      
      Many thanks to Julia Lawall for helping me learn to write a semantic
      patch that does all this.
      
      ===== begin semantic patch =====
      // This adds an llseek= method to all file operations,
      // as a preparation for making no_llseek the default.
      //
      // The rules are
      // - use no_llseek explicitly if we do nonseekable_open
      // - use seq_lseek for sequential files
      // - use default_llseek if we know we access f_pos
      // - use noop_llseek if we know we don't access f_pos,
      //   but we still want to allow users to call lseek
      //
      @ open1 exists @
      identifier nested_open;
      @@
      nested_open(...)
      {
      <+...
      nonseekable_open(...)
      ...+>
      }
      
      @ open exists@
      identifier open_f;
      identifier i, f;
      identifier open1.nested_open;
      @@
      int open_f(struct inode *i, struct file *f)
      {
      <+...
      (
      nonseekable_open(...)
      |
      nested_open(...)
      )
      ...+>
      }
      
      @ read disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      <+...
      (
         *off = E
      |
         *off += E
      |
         func(..., off, ...)
      |
         E = *off
      )
      ...+>
      }
      
      @ read_no_fpos disable optional_qualifier exists @
      identifier read_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ write @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      expression E;
      identifier func;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      <+...
      (
        *off = E
      |
        *off += E
      |
        func(..., off, ...)
      |
        E = *off
      )
      ...+>
      }
      
      @ write_no_fpos @
      identifier write_f;
      identifier f, p, s, off;
      type ssize_t, size_t, loff_t;
      @@
      ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
      {
      ... when != off
      }
      
      @ fops0 @
      identifier fops;
      @@
      struct file_operations fops = {
       ...
      };
      
      @ has_llseek depends on fops0 @
      identifier fops0.fops;
      identifier llseek_f;
      @@
      struct file_operations fops = {
      ...
       .llseek = llseek_f,
      ...
      };
      
      @ has_read depends on fops0 @
      identifier fops0.fops;
      identifier read_f;
      @@
      struct file_operations fops = {
      ...
       .read = read_f,
      ...
      };
      
      @ has_write depends on fops0 @
      identifier fops0.fops;
      identifier write_f;
      @@
      struct file_operations fops = {
      ...
       .write = write_f,
      ...
      };
      
      @ has_open depends on fops0 @
      identifier fops0.fops;
      identifier open_f;
      @@
      struct file_operations fops = {
      ...
       .open = open_f,
      ...
      };
      
      // use no_llseek if we call nonseekable_open
      ////////////////////////////////////////////
      @ nonseekable1 depends on !has_llseek && has_open @
      identifier fops0.fops;
      identifier nso ~= "nonseekable_open";
      @@
      struct file_operations fops = {
      ...  .open = nso, ...
      +.llseek = no_llseek, /* nonseekable */
      };
      
      @ nonseekable2 depends on !has_llseek @
      identifier fops0.fops;
      identifier open.open_f;
      @@
      struct file_operations fops = {
      ...  .open = open_f, ...
      +.llseek = no_llseek, /* open uses nonseekable */
      };
      
      // use seq_lseek for sequential files
      /////////////////////////////////////
      @ seq depends on !has_llseek @
      identifier fops0.fops;
      identifier sr ~= "seq_read";
      @@
      struct file_operations fops = {
      ...  .read = sr, ...
      +.llseek = seq_lseek, /* we have seq_read */
      };
      
      // use default_llseek if there is a readdir
      ///////////////////////////////////////////
      @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier readdir_e;
      @@
      // any other fop is used that changes pos
      struct file_operations fops = {
      ... .readdir = readdir_e, ...
      +.llseek = default_llseek, /* readdir is present */
      };
      
      // use default_llseek if at least one of read/write touches f_pos
      /////////////////////////////////////////////////////////////////
      @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read.read_f;
      @@
      // read fops use offset
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = default_llseek, /* read accesses f_pos */
      };
      
      @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ... .write = write_f, ...
      +	.llseek = default_llseek, /* write accesses f_pos */
      };
      
      // Use noop_llseek if neither read nor write accesses f_pos
      ///////////////////////////////////////////////////////////
      
      @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      identifier write_no_fpos.write_f;
      @@
      // write fops use offset
      struct file_operations fops = {
      ...
       .write = write_f,
       .read = read_f,
      ...
      +.llseek = noop_llseek, /* read and write both use no f_pos */
      };
      
      @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier write_no_fpos.write_f;
      @@
      struct file_operations fops = {
      ... .write = write_f, ...
      +.llseek = noop_llseek, /* write uses no f_pos */
      };
      
      @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      identifier read_no_fpos.read_f;
      @@
      struct file_operations fops = {
      ... .read = read_f, ...
      +.llseek = noop_llseek, /* read uses no f_pos */
      };
      
      @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
      identifier fops0.fops;
      @@
      struct file_operations fops = {
      ...
      +.llseek = noop_llseek, /* no read or write fn */
      };
      ===== End semantic patch =====
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      6038f373
  8. 18 Aug, 2010 1 commit
  9. 12 Aug, 2010 1 commit
    • Steven Rostedt's avatar
      tracing/events: Convert format output to seq_file · 2a37a3df
      Steven Rostedt authored
      
      
      Two new events were added that broke the current format output.
      
      Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout
      
      The reason is that their print_fmt exceeded a page size. Since the output
      of the format used simple_read_from_buffer and trace_seq, it was limited
      to a page size in output.
      
      This patch converts the printing of the format of an event into seq_file,
      which allows greater than a page size to be shown.
      
      I diffed all event formats comparing the output with and without this
      patch. All matched except for the above two, which showed just:
      
        FORMAT TOO BIG
      
      without this patch, but now properly displays the output with this patch.
      
      v2: Remove updating *pos in seq start function.
         [ Thanks to Li Zefan for pointing that out ]
      Reviewed-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
      Cc: James Bottomley <James.Bottomley@suse.de>
      Cc: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2a37a3df
  10. 21 Jul, 2010 1 commit
    • Li Zefan's avatar
      tracing: Allow to disable cmdline recording · e870e9a1
      Li Zefan authored
      
      
      We found that even enabling a single trace event that will rarely be
      triggered can add big overhead to context switch.
      
      (lmbench context switch test)
       -------------------------------------------------
       2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
       ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
      ------ ------ ------ ------ ------ ------- -------
        2.19   2.3   2.21   2.56   2.13     2.54    2.07
        2.39   2.51  2.35   2.75   2.27     2.81    2.24
      
      The overhead is 6% ~ 11%.
      
      It's because when a trace event is enabled 3 tracepoints (sched_switch,
      sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.
      
      We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
      to allow to disable cmdline recording.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e870e9a1
  11. 29 Jun, 2010 1 commit
  12. 28 Jun, 2010 3 commits
  13. 03 Jun, 2010 1 commit
    • Steven Rostedt's avatar
      tracing: Remove ftrace_preempt_disable/enable · 5168ae50
      Steven Rostedt authored
      
      
      The ftrace_preempt_disable/enable functions were to address a
      recursive race caused by the function tracer. The function tracer
      traces all functions which makes it easily susceptible to recursion.
      One area was preempt_enable(). This would call the scheduler and
      the schedulre would call the function tracer and loop.
      (So was it thought).
      
      The ftrace_preempt_disable/enable was made to protect against recursion
      inside the scheduler by storing the NEED_RESCHED flag. If it was
      set before the ftrace_preempt_disable() it would not call schedule
      on ftrace_preempt_enable(), thinking that if it was set before then
      it would have already scheduled unless it was already in the scheduler.
      
      This worked fine except in the case of SMP, where another task would set
      the NEED_RESCHED flag for a task on another CPU, and then kick off an
      IPI to trigger it. This could cause the NEED_RESCHED to be saved at
      ftrace_preempt_disable() but the IPI to arrive in the the preempt
      disabled section. The ftrace_preempt_enable() would not call the scheduler
      because the flag was already set before entring the section.
      
      This bug would cause a missed preemption check and cause lower latencies.
      
      Investigating further, I found that the recusion caused by the function
      tracer was not due to schedule(), but due to preempt_schedule(). Now
      that preempt_schedule is completely annotated with notrace, the recusion
      no longer is an issue.
      Reported-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5168ae50
  14. 14 May, 2010 7 commits
    • Steven Rostedt's avatar
      tracing: Combine event filter_active and enable into single flags field · 553552ce
      Steven Rostedt authored
      
      
      The filter_active and enable both use an int (4 bytes each) to
      set a single flag. We can save 4 bytes per event by combining the
      two into a single integer.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4894944	1018052	 861512	6774508	 675eec	vmlinux.id
      4894871	1012292	 861512	6768675	 674823	vmlinux.flags
      
      This gives us another 5K in savings.
      
      The modification of both the enable and filter fields are done
      under the event_mutex, so it is still safe to combine the two.
      
      Note: Although Mathieu gave his Acked-by, he would like it documented
       that the reads of flags are not protected by the mutex. The way the
       code works, these reads will not break anything, but will have a
       residual effect. Since this behavior is the same even before this
       patch, describing this situation is left to another patch, as this
       patch does not change the behavior, but just brought it to Mathieu's
       attention.
      
      v2: Updated the event trace self test to for this change.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      553552ce
    • Steven Rostedt's avatar
      tracing: Remove duplicate id information in event structure · 32c0edae
      Steven Rostedt authored
      
      
      Now that the trace_event structure is embedded in the ftrace_event_call
      structure, there is no need for the ftrace_event_call id field.
      The id field is the same as the trace_event type field.
      
      Removing the id and re-arranging the structure brings down the tracepoint
      footprint by another 5K.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4895024	1023812	 861512	6780348	 6775bc	vmlinux.print
      4894944	1018052	 861512	6774508	 675eec	vmlinux.id
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      32c0edae
    • Steven Rostedt's avatar
      tracing: Move print functions into event class · 80decc70
      Steven Rostedt authored
      
      
      Currently, every event has its own trace_event structure. This is
      fine since the structure is needed anyway. But the print function
      structure (trace_event_functions) is now separate. Since the output
      of the trace event is done by the class (with the exception of events
      defined by DEFINE_EVENT_PRINT), it makes sense to have the class
      define the print functions that all events in the class can use.
      
      This makes a bigger deal with the syscall events since all syscall events
      use the same class. The savings here is another 30K.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint
      4895024	1023812	 861512	6780348	 6775bc	vmlinux.print
      
      To accomplish this, and to let the class know what event is being
      printed, the event structure is embedded in the ftrace_event_call
      structure. This should not be an issues since the event structure
      was created for each event anyway.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      80decc70
    • Steven Rostedt's avatar
      tracing: Move raw_init from events to class · 0405ab80
      Steven Rostedt authored
      
      
      The raw_init function pointer in the event is used to initialize
      various kinds of events. The type of initialization needed is usually
      classed to the kind of event it is.
      
      Two events with the same class will always have the same initialization
      function, so it makes sense to move this to the class structure.
      
      Perhaps even making a special system structure would work since
      the initialization is the same for all events within a system.
      But since there's no system structure (yet), this will just move it
      to the class.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      
      The text grew very slightly, but this is a constant growth that happened
      with the changing of the C files that call the init code.
      The bigger savings is the data which will be saved the more events share
      a class.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0405ab80
    • Steven Rostedt's avatar
      tracing: Move fields from event to class structure · 2e33af02
      Steven Rostedt authored
      
      
      Move the defined fields from the event to the class structure.
      Since the fields of the event are defined by the class they belong
      to, it makes sense to have the class hold the information instead
      of the individual events. The events of the same class would just
      hold duplicate information.
      
      After this change the size of the kernel dropped another 3K:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
      4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
      
      Although the text increased, this was mainly due to the C files
      having to adapt to the change. This is a constant increase, where
      new tracepoints will not increase the Text. But the big drop is
      in the data size (as well as needed allocations to hold the fields).
      This will give even more savings as more tracepoints are created.
      
      Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
      with several DEFINE_EVENT()s, then the savings will be lost. But
      we are pushing developers to consolidate events with DEFINE_EVENT()
      so this should not be an issue.
      
      The kprobes define a unique class to every new event, but are dynamic
      so it should not be a issue.
      
      The syscalls however have a single class but the fields for the individual
      events are different. The syscalls use a metadata to define the
      fields. I moved the fields list from the event to the metadata and
      added a "get_fields()" function to the class. This function is used
      to find the fields. For normal events and kprobes, get_fields() just
      returns a pointer to the fields list_head in the class. For syscall
      events, it returns the fields list_head in the metadata for the event.
      
      v2:  Fixed the syscall fields. The syscall metadata needs a list
           of fields for both enter and exit.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2e33af02
    • Steven Rostedt's avatar
      tracing: Remove per event trace registering · 2239291a
      Steven Rostedt authored
      
      
      This patch removes the register functions of TRACE_EVENT() to enable
      and disable tracepoints. The registering of a event is now down
      directly in the trace_events.c file. The tracepoint_probe_register()
      is now called directly.
      
      The prototypes are no longer type checked, but this should not be
      an issue since the tracepoints are created automatically by the
      macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
      other macros will catch it.
      
      The trace_event_class structure now holds the probes to be called
      by the callbacks. This removes needing to have each event have
      a separate pointer for the probe.
      
      To handle kprobes and syscalls, since they register probes in a
      different manner, a "reg" field is added to the ftrace_event_class
      structure. If the "reg" field is assigned, then it will be called for
      enabling and disabling of the probe for either ftrace or perf. To let
      the reg function know what is happening, a new enum (trace_reg) is
      created that has the type of control that is needed.
      
      With this new rework, the 82 kernel events and 618 syscall events
      has their footprint dramatically lowered:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
      4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
      
      The size went from 6863829 to 6819176, that's a total of 44K
      in savings. With tracepoints being continuously added, this is
      critical that the footprint becomes minimal.
      
      v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
          specific structure in trace_events.c.
      
      v4: Fixed trace self tests to check probe because regfunc no longer
          exists.
      
      v3: Updated to handle void *data in beginning of probe parameters.
          Also added the tracepoint: check_trace_callback_type_##call().
      
      v2: Changed the callback probes to pass void * and typecast the
          value within the function.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2239291a
    • Steven Rostedt's avatar
      tracing: Create class struct for events · 8f082018
      Steven Rostedt authored
      
      
      This patch creates a ftrace_event_class struct that event structs point to.
      This class struct will be made to hold information to modify the
      events. Currently the class struct only holds the events system name.
      
      This patch slightly increases the size, but this change lays the ground work
      of other changes to make the footprint of tracepoints smaller.
      
      With 82 standard tracepoints, and 618 system call tracepoints
      (two tracepoints per syscall: enter and exit):
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      
      This patch also cleans up some stale comments in ftrace.h.
      
      v2: Fixed missing semi-colon in macro.
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8f082018
  15. 30 Mar, 2010 1 commit
    • Tejun Heo's avatar
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo authored
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Guess-its-ok-by: default avatarChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 10 Mar, 2010 1 commit
  17. 25 Feb, 2010 1 commit
  18. 06 Jan, 2010 2 commits
    • Lai Jiangshan's avatar
      tracing: Remove show_format and related macros from TRACE_EVENT · 0fa0edaf
      Lai Jiangshan authored
      
      
      The previous patches added the use of print_fmt string and changes
      the trace_define_field() function to also create the fields and
      format output for the event format files.
      
         text	   data	    bss	    dec	    hex	filename
      5857201	1355780	9336808	16549789	 fc879d	vmlinux
      5884589	1351684	9337896	16574169	 fce6d9	vmlinux-orig
      
      The above shows the size of the vmlinux after this patch set
      compared to the vmlinux-orig which is before the patch set.
      
      This saves us 27k on text, 1k on bss and adds just 4k of data.
      
      The total savings of 24k in size.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B273D4D.40604@cn.fujitsu.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0fa0edaf
    • Lai Jiangshan's avatar
      tracing: Use defined fields and print_fmt to print formats · 5a65e956
      Lai Jiangshan authored
      
      
      The calls ftrace_format_##call() and ftrace_define_fields_##call()
      are almost duplicate in functionality. With the addition of the
      print_fmt in previous patches, these two functions can be merged
      into one.
      
      The trace_define_field() defines the fields and links them into
      the struct ftrace_event_call. The previous patches introduced
      the print_fmt field and this can now be used with the trace_define_field()
      to create the event format file fields and print_fmt field.
      
      The struct ftrace_event_call->fields are used to print the fields
      The struct ftrace_event_call->print_fmt is used to print
      the "print fmt: XXXXXXXXXXX" line.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B273D49.5000006@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5a65e956
  19. 13 Dec, 2009 3 commits
    • Li Zefan's avatar
      tracing: Move a printk out of ftrace_raw_reg_event_foo() · 3b8e4273
      Li Zefan authored
      
      
      Move the printk from each ftrace_raw_reg_event_foo() to
      its caller ftrace_event_enable_disable(). This avoids each
      regfunc trace event callbacks to handle a same error report
      that can be carried from the caller.
      
      See how much space this saves:
      
         text    data     bss     dec     hex filename
      5345151 1961864 7103260 14410275         dbe223 vmlinux.o.old
      5331487 1961864 7103260 14396611         dbacc3 vmlinux.o
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@redhat.com>
      LKML-Reference: <4B1DC4AC.802@cn.fujitsu.com>
      [start cmdline record before calling regfunc to avoid lost
      window of pid to comm resolution]
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      3b8e4273
    • Li Zefan's avatar
      tracing: Pull up calls to trace_define_common_fields() · 614a71a2
      Li Zefan authored
      
      
      Call trace_define_common_fields() in event_create_dir() only.
      This avoids trace events to handle it from their define_fields
      callbacks and shrinks the kernel code size:
      
         text    data     bss     dec     hex filename
      5346802 1961864 7103260 14411926         dbe896 vmlinux.o.old
      5345151 1961864 7103260 14410275         dbe223 vmlinux.o
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      LKML-Reference: <4B1DC49C.8000107@cn.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      614a71a2
    • Li Zefan's avatar
      tracing: Extract duplicate ftrace_raw_init_event_foo() · 87d9b4e1
      Li Zefan authored
      
      
      Use a generic trace_event_raw_init() function for all event's raw_init
      callbacks (but kprobes) instead of defining the same version for each
      of these.
      This shrinks the kernel code:
      
         text    data     bss     dec     hex filename
      5355293 1961928 7103260 14420481         dc0a01 vmlinux.o.old
      5346802 1961864 7103260 14411926         dbe896 vmlinux.o
      
      raw_init can't be removed, because ftrace events and kprobe events
      use different raw_init callbacks. Though it's possible to totally
      remove raw_init, I choose to leave it as it is for now.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      LKML-Reference: <4B1DC48C.7080603@cn.fujitsu.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      87d9b4e1
  20. 06 Oct, 2009 1 commit
  21. 03 Oct, 2009 1 commit
  22. 24 Sep, 2009 1 commit
  23. 22 Sep, 2009 1 commit
  24. 19 Sep, 2009 1 commit
  25. 17 Sep, 2009 2 commits
    • Masami Hiramatsu's avatar
      ftrace: Fix trace_remove_event_call() to lock trace_event_mutex · 4fead8e4
      Masami Hiramatsu authored
      
      
      Lock not only event_mutex but also trace_event_mutex in
      trace_remove_event_call() to protect __unregister_ftrace_event().
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <20090914204912.18779.68734.stgit@dhcp-100-2-132.bos.redhat.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      4fead8e4
    • Masami Hiramatsu's avatar
      ftrace: Fix trace_add_event_call() to initialize list · 588bebb7
      Masami Hiramatsu authored
      
      
      Handle failure path in trace_add_event_call() to fix the below bug
      which occurred when I tried to add invalid event twice.
      
      Could not create debugfs 'kmalloc' directory
      Failed to register kprobe event: kmalloc
      Faild to register probe event(-1)
      ------------[ cut here ]------------
      WARNING: at /home/mhiramat/ksrc/random-tracing/lib/list_debug.c:26
      __list_add+0x27/0x5c()
      Hardware name:
      list_add corruption. next->prev should be prev (c07d78cc), but was
      00001000. (next=d854236c).
      Modules linked in: sunrpc uinput virtio_net virtio_balloon i2c_piix4 pcspkr
      i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded:
      scsi_wait_scan]
      Pid: 1394, comm: tee Not tainted 2.6.31-rc9 #51
      Call Trace:
       [<c0438424>] warn_slowpath_common+0x65/0x7c
       [<c05371b3>] ? __list_add+0x27/0x5c
       [<c043846f>] warn_slowpath_fmt+0x24/0x27
       [<c05371b3>] __list_add+0x27/0x5c
       [<c047f050>] list_add+0xa/0xc
       [<c047f8f5>] trace_add_event_call+0x60/0x97
       [<c0483133>] command_trace_probe+0x42c/0x51b
       [<c044a1b3>] ? remove_wait_queue+0x22/0x27
       [<c042a9c0>] ? __wake_up+0x32/0x3b
       [<c04832f6>] probes_write+0xd4/0x10a
       [<c0483222>] ? probes_write+0x0/0x10a
       [<c04b27a9>] vfs_write+0x80/0xdf
       [<c04b289c>] sys_write+0x3b/0x5d
       [<c0670d41>] syscall_call+0x7/0xb
      ---[ end trace 2b962b5dc1fdc07d ]---
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <4AB1077F.6020107@redhat.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      588bebb7
  26. 14 Sep, 2009 1 commit
    • Steven Rostedt's avatar
      tracing: make testing syscall events a separate configuration · 1f5a6b45
      Steven Rostedt authored
      
      
      Parag noticed that the number of event tests has increased tremendously:
      
      grep "Testing event" dmesg.31rc9 |wc -l
      100
      
      grep "Testing event" dmesg.31git |wc -l
      1172
      
      This is due to the testing of every syscall event when ftrace self
      test is enabled. This adds a bit more time to kernel boot up and can
      affect development by slowing down the time it takes between reboots.
      
      This option makes the testing of the syscall events into a separate
      config, to still be able to test most of ftrace internals at boot up
      but not have to wait for all the syscall events to be tested.
      
      The syscall event testing only tests the enabling and disabling of
      the trace point, since the syscalls are not executed. What really needs
      to be done is to somehow have a userspace tool test the syscall tracepoints
      as well.
      Reported-by: default avatarParag Warudkar <parag.lkml@gmail.com>
      LKML-Reference: <f7848160909130815l3e768a30n3b28808bbe5c254b@mail.gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1f5a6b45