1. 15 Mar, 2013 5 commits
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Only clear trace buffer on module unload if event was traced · 575380da
      Steven Rostedt (Red Hat) authored
      
      
      Currently, when a module with events is unloaded, the trace buffer is
      cleared. This is just a safety net in case the module might have some
      strange callback when its event is outputted. But there's no reason
      to reset the buffer if the module didn't have any of its events traced.
      
      Add a flag to the event "call" structure called WAS_ENABLED and gets set
      when the event is ever enabled, and this flag never gets cleared. When a
      module gets unloaded, if any of its events have this flag set, then the
      trace buffer will get cleared.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      575380da
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Add comment for trace event flag IGNORE_ENABLE · 2a30c11f
      Steven Rostedt (Red Hat) authored
      
      
      All the trace event flags have comments but the IGNORE_ENABLE flag
      which is set for ftrace internal events that should not be enabled
      via the debugfs "enable" file. That is, if the top level enable file
      is set, it will enable all events. It use to just check the ftrace
      event call descriptor "reg" field and skip those whithout it, but now
      some ftrace internal events have a reg field but still need to be
      skipped. The flag was created to ignore those events.
      
      Now document it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      2a30c11f
    • Li Zefan's avatar
      tracing: Add a helper function for event print functions · f71130de
      Li Zefan authored
      Move duplicate code in event print functions to a helper function.
      
      This shrinks the size of the kernel by ~13K.
      
         text    data     bss     dec     hex filename
      6596137 1743966 10138672        18478775        119f6b7 vmlinux.o.old
      6583002 1743849 10138672        18465523        119c2f3 vmlinux.o.new
      
      Link: http://lkml.kernel.org/r/51258746.2060304@huawei.com
      
      Signed-off-by: default avatarLi Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f71130de
    • Steven Rostedt's avatar
      tracing: Pass the ftrace_file to the buffer lock reserve code · ccb469a1
      Steven Rostedt authored
      
      
      Pass the struct ftrace_event_file *ftrace_file to the
      trace_event_buffer_lock_reserve() (new function that replaces the
      trace_current_buffer_lock_reserver()).
      
      The ftrace_file holds a pointer to the trace_array that is in use.
      In the case of multiple buffers with different trace_arrays, this
      allows different events to be recorded into different buffers.
      
      Also fixed some of the stale comments in include/trace/ftrace.h
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ccb469a1
    • Steven Rostedt's avatar
      tracing: Separate out trace events from global variables · ae63b31e
      Steven Rostedt authored
      
      
      The trace events for ftrace are all defined via global variables.
      The arrays of events and event systems are linked to a global list.
      This prevents multiple users of the event system (what to enable and
      what not to).
      
      By adding descriptors to represent the event/file relation, as well
      as to which trace_array descriptor they are associated with, allows
      for more than one set of events to be defined. Once the trace events
      files have a link between the trace event and the trace_array they
      are associated with, we can create multiple trace_arrays that can
      record separate events in separate buffers.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      ae63b31e
  2. 30 Jan, 2013 1 commit
    • Hiraku Toyooka's avatar
      tracing: Make a snapshot feature available from userspace · debdd57f
      Hiraku Toyooka authored
      Ftrace has a snapshot feature available from kernel space and
      latency tracers (e.g. irqsoff) are using it. This patch enables
      user applictions to take a snapshot via debugfs.
      
      Add "snapshot" debugfs file in "tracing" directory.
      
        snapshot:
          This is used to take a snapshot and to read the output of the
          snapshot.
      
           # echo 1 > snapshot
      
          This will allocate the spare buffer for snapshot (if it is
          not allocated), and take a snapshot.
      
           # cat snapshot
      
          This will show contents of the snapshot.
      
           # echo 0 > snapshot
      
          This will free the snapshot if it is allocated.
      
          Any other positive values will clear the snapshot contents if
          the snapshot is allocated, or return EINVAL if it is not allocated.
      
      Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia
      
      
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: default avatarHiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      [
         Fixed irqsoff selftest and also a conflict with a change
         that fixes the update_max_tr.
      ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      debdd57f
  3. 22 Jan, 2013 1 commit
    • Steven Rostedt's avatar
      tracing: Remove the extra 4 bytes of padding in events · b000c806
      Steven Rostedt authored
      Due to a userspace issue with PowerTop v2beta, which hardcoded
      the offset of event fields that it was using, it broke when
      we removed the Big Kernel Lock counter from the event header.
      
       (commit e6e1e259 "tracing: Remove lock_depth from event entry")
      
      Because this broke userspace, it was determined that we must
      keep those 4 bytes around.
      
       (commit a3a4a5ac
      
       "Regression: partial revert "tracing: Remove lock_depth from event entry"")
      
      This unfortunately wastes space in the ring buffer. 4 bytes per
      event, where a lot of events are just 24 bytes. That's 16% of the
      buffer wasted. A million events will add 4 megs of white space
      into the buffer.
      
      It was later noticed that PowerTop v2beta could not work on systems
      where the kernel was 64 bit but the userspace was 32 bits.
      The reason was because the offsets are different between the
      two and the hard coded offset of one would not work with the other.
      
      With PowerTop v2 final, it implemented the same interface that both
      perf and trace-cmd use. That is, it reads the format file of
      the event to find the offsets of the fields it needs. This fixes
      the problem with running powertop on a 32 bit userspace running
      on a 64 bit kernel. It also no longer requires the 4 byte padding.
      
      As PowerTop v2 has been out for a while, and is included in all
      major distributions, it is time that we can safely remove the
      4 bytes of padding. Users of PowerTop v2beta should upgrade to
      PowerTop v2 final.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Acked-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b000c806
  4. 21 Jan, 2013 1 commit
    • Steven Rostedt's avatar
      tracing: Fix sparse warning with is_signed_type() macro · 418c59e4
      Steven Rostedt authored
      
      
      Sparse complains when is_signed_type() is used on a pointer.
      This macro is needed for the format output used for ftrace
      and perf, to know if a binary field is a signed type or not.
      The is_signed_type() macro is used against all fields that are
      recorded by events to automate the operation.
      
      The problem sparse has is with the current way is_signed_type()
      works:
      
        ((type)-1 < 0)
      
      If "type" is a poiner, than sparse does not like it being compared
      to an integer (zero). The simple fix is to just give zero the
      same type. The runtime result stays the same.
      Reported-by: default avatarRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      418c59e4
  5. 13 Nov, 2012 1 commit
    • David Sharp's avatar
      tracing: Format non-nanosec times from tsc clock without a decimal point. · 8be0709f
      David Sharp authored
      With the addition of the "tsc" clock, formatting timestamps to look like
      fractional seconds is misleading. Mark clocks as either in nanoseconds or
      not, and format non-nanosecond timestamps as decimal integers.
      
      Tested:
      $ cd /sys/kernel/debug/tracing/
      $ cat trace_clock
      [local] global tsc
      $ echo sched_switch > set_event
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ cat trace
                <idle>-0     [000]  6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
                 sleep-29964 [000]  6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
         sleep-29964   0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      $ echo tsc > trace_clock
      $ cat trace
      $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
      $ echo 0 > options/latency-format
      $ cat trace
                <idle>-0     [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
                 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      echo 1 > options/latency-format
      $ cat trace
        <idle>-0       0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
         sleep-31128   0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
        ...
      
      v2:
      Move arch-specific bits out of generic code.
      v4:
      Fix x86_32 build due to 64-bit division.
      
      Google-Bug-Id: 6980623
      Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com
      
      
      
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarDavid Sharp <dhsharp@google.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8be0709f
  6. 02 Nov, 2012 1 commit
    • Steven Rostedt's avatar
      tracing: Use irq_work for wake ups and remove *_nowake_*() functions · 0d5c6e1c
      Steven Rostedt authored
      
      
      Have the ring buffer commit function use the irq_work infrastructure to
      wake up any waiters waiting on the ring buffer for new data. The irq_work
      was created for such a purpose, where doing the actual wake up at the
      time of adding data is too dangerous, as an event or function trace may
      be in the midst of the work queue locks and cause deadlocks. The irq_work
      will either delay the action to the next timer interrupt, or trigger an IPI
      to itself forcing an interrupt to do the work (in a safe location).
      
      With irq_work, all ring buffer commits can safely do wakeups, removing
      the need for the ring buffer commit "nowake" variants, which were used
      by events and function tracing. All commits can now safely use the
      normal commit, and the "nowake" variants can be removed.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      0d5c6e1c
  7. 31 Jul, 2012 1 commit
  8. 28 Jun, 2012 1 commit
    • Steven Rostedt's avatar
      tracing: Remove NR_CPUS array from trace_iterator · 6d158a81
      Steven Rostedt authored
      
      
      Replace the NR_CPUS array of buffer_iter from the trace_iterator
      with an allocated array. This will just create an array of
      possible CPUS instead of the max number specified.
      
      The use of NR_CPUS in that array caused allocation failures for
      machines that were tight on memory. This did not cause any failures
      to the system itself (no crashes), but caused unnecessary failures
      for reading the trace files.
      
      Added a helper function called 'trace_buffer_iter()' that returns
      the buffer_iter item or NULL if it is not defined or the array was
      not allocated. Some routines do not require the array
      (tracing_open_pipe() for one).
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      6d158a81
  9. 14 Jun, 2012 1 commit
    • Steven Rostedt's avatar
      tracing: Add comments for the other bits of ftrace_event_call.flags · 5da43bed
      Steven Rostedt authored
      
      
      	TRACE_EVENT_FL_ENABLED_BIT,
      	TRACE_EVENT_FL_FILTERED_BIT,
      	TRACE_EVENT_FL_RECORDED_CMD_BIT,
      
      Have comments about what they are, but:
      
      	TRACE_EVENT_FL_CAP_ANY_BIT,
      	TRACE_EVENT_FL_NO_SET_FILTER_BIT,
      	TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
      
      do not, making them second class citizens. To prevent another
      class warfare, these bits have protested for their right to be
      commented. And By Golly! I'll give them what they want!
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5da43bed
  10. 10 May, 2012 1 commit
    • Steven Rostedt's avatar
      tracing: Do not enable function event with enable · 9b63776f
      Steven Rostedt authored
      
      
      With the adding of function tracing event to perf, it caused a
      side effect that produces the following warning when enabling all
      events in ftrace:
      
       # echo 1 > /sys/kernel/debug/tracing/events/enable
      
      [console]
      event trace: Could not enable event function
      
      This is because when enabling all events via the debugfs system
      it ignores events that do not have a ->reg() function assigned.
      This was to skip over the ftrace internal events (as they are
      not TRACE_EVENTs). But as the ftrace function event now has
      a ->reg() function attached to it for use with perf, it is no
      longer ignored.
      
      Worse yet, this ->reg() function is being called when it should
      not be. It returns an error and causes the above warning to
      be printed.
      
      By adding a new event_call flag (TRACE_EVENT_FL_IGNORE_ENABLE)
      and have all ftrace internel event structures have it set,
      setting the events/enable will no longe try to incorrectly enable
      the function event and does not warn.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      9b63776f
  11. 14 Mar, 2012 1 commit
  12. 21 Feb, 2012 3 commits
  13. 05 Dec, 2011 1 commit
  14. 02 Nov, 2011 1 commit
  15. 14 Jul, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Have dynamic size event stack traces · 4a9bd3f1
      Steven Rostedt authored
      
      
      Currently the stack trace per event in ftace is only 8 frames.
      This can be quite limiting and sometimes useless. Especially when
      the "ignore frames" is wrong and we also use up stack frames for
      the event processing itself.
      
      Change this to be dynamic by adding a percpu buffer that we can
      write a large stack frame into and then copy into the ring buffer.
      
      For interrupts and NMIs that come in while another event is being
      process, will only get to use the 8 frame stack. That should be enough
      as the task that it interrupted will have the full stack frame anyway.
      Requested-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      4a9bd3f1
  16. 15 Jun, 2011 1 commit
    • Masami Hiramatsu's avatar
      tracing/kprobes: Fix kprobe-tracer to support stack trace · 1fd8df2c
      Masami Hiramatsu authored
      
      
      Fix to support kernel stack trace correctly on kprobe-tracer.
      Since the execution path of kprobe-based dynamic events is different
      from other tracepoint-based events, normal ftrace_trace_stack() doesn't
      work correctly. To fix that, this introduces ftrace_trace_stack_regs()
      which traces stack via pt_regs instead of current stack register.
      
      e.g.
      
       # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
       # echo 1 > /sys/kernel/debug/tracing/options/stacktrace
       # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
       # head -n 20 /sys/kernel/debug/tracing/trace
                  bash-2968  [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
                  bash-2968  [000] 10297.050247: <stack trace>
       => schedule_timeout
       => n_tty_read
       => tty_read
       => vfs_read
       => sys_read
       => system_call_fastpath
           kworker/0:1-2940  [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
           kworker/0:1-2940  [000] 10297.050266: <stack trace>
       => worker_thread
       => kthread
       => kernel_thread_helper
                  sshd-1132  [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
                  sshd-1132  [000] 10297.050365: <stack trace>
       => sysret_careful
      
      Note: Even with this fix, the first entry will be skipped
      if the probe is put on the function entry area before
      the frame pointer is set up (usually, that is 4 bytes
       (push %bp; mov %sp %bp) on x86), because stack unwinder
      depends on the frame pointer.
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15
      
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1fd8df2c
  17. 26 May, 2011 1 commit
  18. 06 May, 2011 1 commit
  19. 10 Mar, 2011 1 commit
    • Steven Rostedt's avatar
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt authored
      
      
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
  20. 08 Feb, 2011 1 commit
  21. 19 Nov, 2010 1 commit
    • Steven Rostedt's avatar
      tracing/events: Show real number in array fields · 04295780
      Steven Rostedt authored
      
      
      Currently we have in something like the sched_switch event:
      
        field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;	signed:1;
      
      When a userspace tool such as perf tries to parse this, the
      TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
      simply uses a #len to show the string of the length. When the length is
      an enum, we get a string that means nothing for tools.
      
      By adding a static buffer and a mutex to protect it, we can store the
      string into that buffer with snprintf and show the actual number.
      Now we get:
      
        field:char prev_comm[16];       offset:12;      size:16;        signed:1;
      
      Something much more useful.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      04295780
  22. 18 Nov, 2010 2 commits
    • Frederic Weisbecker's avatar
      tracing: Allow syscall trace events for non privileged users · 53cf810b
      Frederic Weisbecker authored
      
      
      As for the raw syscalls events, individual syscall events won't
      leak system wide information on task bound tracing. Allow non
      privileged users to use them in such workflow.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      53cf810b
    • Frederic Weisbecker's avatar
      tracing: New flag to allow non privileged users to use a trace event · 61c32659
      Frederic Weisbecker authored
      
      
      This adds a new trace event internal flag that allows them to be
      used in perf by non privileged users in case of task bound tracing.
      
      This is desired for syscalls tracepoint because they don't leak
      global system informations, like some other tracepoints.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Jason Baron <jbaron@redhat.com>
      61c32659
  23. 09 Sep, 2010 1 commit
    • Peter Zijlstra's avatar
      perf: Rework the PMU methods · a4eaf7f1
      Peter Zijlstra authored
      
      
      Replace pmu::{enable,disable,start,stop,unthrottle} with
      pmu::{add,del,start,stop}, all of which take a flags argument.
      
      The new interface extends the capability to stop a counter while
      keeping it scheduled on the PMU. We replace the throttled state with
      the generic stopped state.
      
      This also allows us to efficiently stop/start counters over certain
      code paths (like IRQ handlers).
      
      It also allows scheduling a counter without it starting, allowing for
      a generic frozen state (useful for rotating stopped counters).
      
      The stopped state is implemented in two different ways, depending on
      how the architecture implemented the throttled state:
      
       1) We disable the counter:
          a) the pmu has per-counter enable bits, we flip that
          b) we program a NOP event, preserving the counter state
      
       2) We store the counter state and ignore all read/overflow events
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus <paulus@samba.org>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Michael Cree <mcree@orcon.net.nz>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      a4eaf7f1
  24. 18 Aug, 2010 1 commit
  25. 21 Jul, 2010 2 commits
    • Lai Jiangshan's avatar
      tracing: Reduce latency and remove percpu trace_seq · bc289ae9
      Lai Jiangshan authored
      
      
      __print_flags() and __print_symbolic() use percpu trace_seq:
      
      1) Its memory is allocated at compile time, it wastes memory if we don't use tracing.
      2) It is percpu data and it wastes more memory for multi-cpus system.
      3) It disables preemption when it executes its core routine
         "trace_seq_printf(s, "%s: ", #call);" and introduces latency.
      
      So we move this trace_seq to struct trace_iterator.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4C078350.7090106@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      bc289ae9
    • Li Zefan's avatar
      tracing: Allow to disable cmdline recording · e870e9a1
      Li Zefan authored
      
      
      We found that even enabling a single trace event that will rarely be
      triggered can add big overhead to context switch.
      
      (lmbench context switch test)
       -------------------------------------------------
       2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
       ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
      ------ ------ ------ ------ ------ ------- -------
        2.19   2.3   2.21   2.56   2.13     2.54    2.07
        2.39   2.51  2.35   2.75   2.27     2.81    2.24
      
      The overhead is 6% ~ 11%.
      
      It's because when a trace event is enabled 3 tracepoints (sched_switch,
      sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.
      
      We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
      to allow to disable cmdline recording.
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      e870e9a1
  26. 29 Jun, 2010 1 commit
  27. 09 Jun, 2010 1 commit
  28. 21 May, 2010 2 commits
  29. 18 May, 2010 1 commit
  30. 15 May, 2010 1 commit
    • Steven Rostedt's avatar
      tracing: Comment the use of event_mutex with trace event flags · 1eaa4787
      Steven Rostedt authored
      
      
      The flags variable is protected by the event_mutex when modifying,
      but the event_mutex is not held when reading the variable.
      
      This is due to the fact that the reads occur in critical sections where
      taking a mutex (or even a spinlock) is not wanted.
      
      But the two flags that exist (enable and filter_active) have the code
      written as such to handle the reads to not need a lock.
      
      The enable flag is used just to know if the event is enabled or not
      and its use is always under the event_mutex. Whether or not the event
      is actually enabled is really determined by the tracepoint being
      registered. The flag is just a way to let the code know if the tracepoint
      is registered.
      
      The filter_active is different. It is read without the lock. If it
      is set, then the event probes jump to the filter code. There can be a
      slight mismatch between filters available and filter_active. If the flag is
      set but no filters are available, the code safely jumps to a filter nop.
      If the flag is not set and the filters are available, then the filters
      are skipped. This is acceptable since filters are usually set before
      tracing or they are set by humans, which would not notice the slight
      delay that this causes.
      
      v2: Fixed typo: "cacheing" -> "caching"
      Reported-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      1eaa4787
  31. 14 May, 2010 1 commit
    • Steven Rostedt's avatar
      tracing: Combine event filter_active and enable into single flags field · 553552ce
      Steven Rostedt authored
      
      
      The filter_active and enable both use an int (4 bytes each) to
      set a single flag. We can save 4 bytes per event by combining the
      two into a single integer.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4894944	1018052	 861512	6774508	 675eec	vmlinux.id
      4894871	1012292	 861512	6768675	 674823	vmlinux.flags
      
      This gives us another 5K in savings.
      
      The modification of both the enable and filter fields are done
      under the event_mutex, so it is still safe to combine the two.
      
      Note: Although Mathieu gave his Acked-by, he would like it documented
       that the reads of flags are not protected by the mutex. The way the
       code works, these reads will not break anything, but will have a
       residual effect. Since this behavior is the same even before this
       patch, describing this situation is left to another patch, as this
       patch does not change the behavior, but just brought it to Mathieu's
       attention.
      
      v2: Updated the event trace self test to for this change.
      Acked-by: default avatarMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      553552ce