1. 05 Nov, 2013 2 commits
    • Tom Zanussi's avatar
      tracing: Make register/unregister_ftrace_command __init · 38de93ab
      Tom Zanussi authored
      register/unregister_ftrace_command() are only ever called from __init
      functions, so can themselves be made __init.
      Also make register_snapshot_cmd() __init for the same reason.
      Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com
      Signed-off-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Have control op function callback only trace when RCU is watching · b5aa3a47
      Steven Rostedt (Red Hat) authored
      Dave Jones reported that trinity would be able to trigger the following
      back trace:
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #38 Not tainted
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
       RCU used illegally from idle CPU!  rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       1 lock held by trinity-child1/18786:
        #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
       stack backtrace:
       CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
        0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
        ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
        0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
       Call Trace:
        [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
        [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
        [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
        [<ffffffff81309289>] ? __const_udelay+0x29/0x30
        [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
        [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
        [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
        [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
        [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
        [<ffffffff816f4000>] ftrace_call+0x5/0x2f
        [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
        [<ffffffff81103673>] rcu_user_enter+0x13/0x20
        [<ffffffff8114541a>] user_enter+0x6a/0xd0
        [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
        [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
       ------------[ cut here ]------------
      Perf uses rcu_read_lock() but as the function tracer can trace functions
      even when RCU is not currently active, this makes the rcu_read_lock()
      used by perf ineffective.
      As perf is currently the only user of the ftrace_ops_control_func() and
      perf is also the only function callback that actively uses rcu_read_lock(),
      the quick fix is to prevent the ftrace_ops_control_func() from calling
      its callbacks if RCU is not active.
      With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  2. 19 Oct, 2013 4 commits
  3. 03 Sep, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix a slight race in modifying what function callback gets traced · 59338f75
      Steven Rostedt (Red Hat) authored
      There's a slight race when going from a list function to a non list
      function. That is, when only one callback is registered to the function
      tracer, it gets called directly by the mcount trampoline. But if this
      function has filters, it may be called by the wrong functions.
      As the list ops callback that handles multiple callbacks that are
      registered to ftrace, it also handles what functions they call. While
      the transaction is taking place, use the list function always, and
      after all the updates are finished (only the functions that should be
      traced are being traced), then we can update the trampoline to call
      the function directly.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  4. 31 Jul, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Check module functions being traced on reload · 8c4f3c3f
      Steven Rostedt (Red Hat) authored
      There's been a nasty bug that would show up and not give much info.
      The bug displayed the following warning:
       WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
       Pid: 20903, comm: bash Tainted: G           O 3.6.11+ #38405.trunk
       Call Trace:
        [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
        [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
        [<ffffffff811401cc>] ? kfree+0x2c/0x110
        [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
        [<ffffffff81149f1e>] __fput+0xae/0x220
        [<ffffffff8114a09e>] ____fput+0xe/0x10
        [<ffffffff8105fa22>] task_work_run+0x72/0x90
        [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
        [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
        [<ffffffff815c0f88>] int_signal+0x12/0x17
       ---[ end trace 793179526ee09b2c ]---
      It was finally narrowed down to unloading a module that was being traced.
      It was actually more than that. When functions are being traced, there's
      a table of all functions that have a ref count of the number of active
      tracers attached to that function. When a function trace callback is
      registered to a function, the function's record ref count is incremented.
      When it is unregistered, the function's record ref count is decremented.
      If an inconsistency is detected (ref count goes below zero) the above
      warning is shown and the function tracing is permanently disabled until
      The ftrace callback ops holds a hash of functions that it filters on
      (and/or filters off). If the hash is empty, the default means to filter
      all functions (for the filter_hash) or to disable no functions (for the
      When a module is unloaded, it frees the function records that represent
      the module functions. These records exist on their own pages, that is
      function records for one module will not exist on the same page as
      function records for other modules or even the core kernel.
      Now when a module unloads, the records that represents its functions are
      freed. When the module is loaded again, the records are recreated with
      a default ref count of zero (unless there's a callback that traces all
      functions, then they will also be traced, and the ref count will be
      The problem is that if an ftrace callback hash includes functions of the
      module being unloaded, those hash entries will not be removed. If the
      module is reloaded in the same location, the hash entries still point
      to the functions of the module but the module's ref counts do not reflect
      With the help of Steve and Joern, we found a reproducer:
       Using uinput module and uinput_release function.
       cd /sys/kernel/debug/tracing
       modprobe uinput
       echo uinput_release > set_ftrace_filter
       echo function > current_tracer
       rmmod uinput
       modprobe uinput
       # check /proc/modules to see if loaded in same addr, otherwise try again
       echo nop > current_tracer
      The above loads the uinput module, which creates a table of functions that
      can be traced within the module.
      We add uinput_release to the filter_hash to trace just that function.
      Enable function tracincg, which increments the ref count of the record
      associated to uinput_release.
      Remove uinput, which frees the records including the one that represents
      Load the uinput module again (and make sure it's at the same address).
      This recreates the function records all with a ref count of zero,
      including uinput_release.
      Disable function tracing, which will decrement the ref count for uinput_release
      which is now zero because of the module removal and reload, and we have
      a mismatch (below zero ref count).
      The solution is to check all currently tracing ftrace callbacks to see if any
      are tracing any of the module's functions when a module is loaded (it already does
      that with callbacks that trace all functions). If a callback happens to have
      a module function being traced, it increments that records ref count and starts
      tracing that function.
      There may be a strange side effect with this, where tracing module functions
      on unload and then reloading a new module may have that new module's functions
      being traced. This may be something that confuses the user, but it's not
      a big deal. Another approach is to disable all callback hashes on module unload,
      but this leaves some ftrace callbacks that may not be registered, but can
      still have hashes tracing the module's function where ftrace doesn't know about
      it. That situation can cause the same bug. This solution solves that case too.
      Another benefit of this solution, is it is possible to trace a module's
      function on unload and load.
      Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
      Reported-by: default avatarJörn Engel <joern@logfs.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarSteve Hodgson <steve@purestorage.com>
      Tested-by: default avatarSteve Hodgson <steve@purestorage.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  5. 30 Jul, 2013 1 commit
  6. 24 Jul, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Add check for NULL regs if ops has SAVE_REGS set · 195a8afc
      Steven Rostedt (Red Hat) authored
      If a ftrace ops is registered with the SAVE_REGS flag set, and there's
      already a ops registered to one of its functions but without the
      SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets
      added to the list of callbacks to call for that function before the
      callback trampoline gets set to save the regs.
      The problem is, the function is not currently saving regs, which opens
      a small race window where the ops that is expecting regs to be passed
      to it, wont. This can cause a crash if the callback were to reference
      the regs, as the SAVE_REGS guarantees that regs will be set.
      To fix this, we add a check in the loop case where it checks if the ops
      has the SAVE_REGS flag set, and if so, it will ignore it if regs is
      not set.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  7. 02 Jul, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Do not run selftest if command line parameter is set · f1ed7c74
      Steven Rostedt (Red Hat) authored
      If the kernel command line ftrace filter parameters are set
      (ftrace_filter or ftrace_notrace), force the function self test to
      pass, with a warning why it was forced.
      If the user adds a filter to the kernel command line, it is assumed
      that they know what they are doing, and the self test should just not
      run instead of failing (which disables function tracing) or clearing
      the filter, as that will probably annoy the user.
      If the user wants the selftest to run, the message will tell them why
      it did not.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  8. 20 Jun, 2013 1 commit
  9. 11 Jun, 2013 1 commit
    • Steven Rostedt's avatar
      ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched() · 7614c3dc
      Steven Rostedt authored
      The function tracer uses preempt_disable/enable_notrace() for
      synchronization between reading registered ftrace_ops and unregistering
      Most of the ftrace_ops are global permanent structures that do not
      require this synchronization. That is, ops may be added and removed from
      the hlist but are never freed, and wont hurt if a synchronization is
      But this is not true for dynamically created ftrace_ops or control_ops,
      which are used by the perf function tracing.
      The problem here is that the function tracer can be used to trace
      kernel/user context switches as well as going to and from idle.
      Basically, it can be used to trace blind spots of the RCU subsystem.
      This means that even though preempt_disable() is done, a
      synchronize_sched() will ignore CPUs that haven't made it out of user
      space or idle. These can include functions that are being traced just
      before entering or exiting the kernel sections.
      To implement the RCU synchronization, instead of using
      synchronize_sched() the use of schedule_on_each_cpu() is performed. This
      means that when a dynamically allocated ftrace_ops, or a control ops is
      being unregistered, all CPUs must be touched and execute a ftrace_sync()
      stub function via the work queues. This will rip CPUs out from idle or
      in dynamic tick mode. This only happens when a user disables perf
      function tracing or other dynamically allocated function tracers, but it
      allows us to continue to debug RCU and context tracking with function
      Link: http://lkml.kernel.org/r/1369785676.15552.55.camel@gandalf.local.home
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  10. 29 May, 2013 1 commit
  11. 10 May, 2013 5 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function probe when more than one probe is added · 19dd603e
      Steven Rostedt (Red Hat) authored
      When the first function probe is added and the function tracer
      is updated the functions are modified to call the probe.
      But when a second function is added, it updates the function
      records to have the second function also update, but it fails
      to update the actual function itself.
      This prevents the second (or third or forth and so on) probes
      from having their functions called.
        # echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter
        # echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter
        # cat trace
       # tracer: nop
       # entries-in-buffer/entries-written: 0/0   #P:4
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
        # touch /tmp/a
        # rm /tmp/a
        # cat trace
       # tracer: nop
       # entries-in-buffer/entries-written: 0/0   #P:4
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
        # ln -s /tmp/a
        # cat trace
       # tracer: nop
       # entries-in-buffer/entries-written: 414/414   #P:4
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
                 <idle>-0     [000] d..3  2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120
                  <...>-3114  [001] d..4  2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120
                   bash-2786  [000] d..3  2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120
            kworker/0:1-34    [000] d..3  2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
                 <idle>-0     [002] d..3  2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120
                   sshd-2783  [002] d..3  2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
      Still need to update the functions even though the probe itself
      does not need to be registered again when added a new probe.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix the output of enabled_functions debug file · 23ea9c4d
      Steven Rostedt (Red Hat) authored
      The enabled_functions debugfs file was created to be able to see
      what functions have been modified from nops to calling a tracer.
      The current method uses the counter in the function record.
      As when a ftrace_ops is registered to a function, its count
      increases. But that doesn't mean that the function is actively
      being traced. /proc/sys/kernel/ftrace_enabled can be set to zero
      which would disable it, as well as something can go wrong and
      we can think its enabled when only the counter is set.
      The record's FTRACE_FL_ENABLED flag is set or cleared when its
      function is modified. That is a much more accurate way of knowing
      what function is enabled or not.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix locking in register_ftrace_function_probe() · 5ae0bf59
      Steven Rostedt (Red Hat) authored
      The iteration of the ftrace function list and the call to
      ftrace_match_record() need to be protected by the ftrace_lock.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Masami Hiramatsu's avatar
      ftrace: Cleanup regex_lock and ftrace_lock around hash updating · 3f2367ba
      Masami Hiramatsu authored
      Cleanup regex_lock and ftrace_lock locking points around
      ftrace_ops hash update code.
      The new rule is that regex_lock protects ops->*_hash
      read-update-write code for each ftrace_ops. Usually,
      hash update is done by following sequence.
      1. allocate a new local hash and copy the original hash.
      2. update the local hash.
      3. move(actually, copy) back the local hash to ftrace_ops.
      4. update ftrace entries if needed.
      5. release the local hash.
      This makes regex_lock protect #1-#4, and ftrace_lock
      to protect #3, #4 and adding and removing ftrace_ops from the
      ftrace_ops_list. The ftrace_lock protects #3 as well because
      the move functions update the entries too.
      Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Masami Hiramatsu's avatar
      ftrace, kprobes: Fix a deadlock on ftrace_regex_lock · f04f24fb
      Masami Hiramatsu authored
      Fix a deadlock on ftrace_regex_lock which happens when setting
      an enable_event trigger on dynamic kprobe event as below.
      sh-2.05b# echo p vfs_symlink > kprobe_events
      sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter
      [ INFO: possible recursive locking detected ]
      3.9.0+ #35 Not tainted
      sh/72 is trying to acquire lock:
       (ftrace_regex_lock){+.+.+.}, at: [<ffffffff810ba6c1>] ftrace_set_hash+0x81/0x1f0
      but task is already holding lock:
       (ftrace_regex_lock){+.+.+.}, at: [<ffffffff810b7cbd>] ftrace_regex_write.isra.29.part.30+0x3d/0x220
      other info that might help us debug this:
       Possible unsafe locking scenario:
       *** DEADLOCK ***
      To fix that, this introduces a finer regex_lock for each ftrace_ops.
      ftrace_regex_lock is too big of a lock which protects all
      filter/notrace_hash operations, but it doesn't need to be a global
      lock after supporting multiple ftrace_ops because each ftrace_ops
      has its own filter/notrace_hash.
      Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      [ Added initialization flag and automate mutex initialization for
        non ftrace.c ftrace_probes. ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  12. 09 May, 2013 1 commit
  13. 13 Apr, 2013 3 commits
  14. 12 Apr, 2013 2 commits
  15. 09 Apr, 2013 3 commits
  16. 08 Apr, 2013 3 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Do not call stub functions in control loop · 395b97a3
      Steven Rostedt (Red Hat) authored
      The function tracing control loop used by perf spits out a warning
      if the called function is not a control function. This is because
      the control function references a per cpu allocated data structure
      on struct ftrace_ops that is not allocated for other types of
      commit 0a016409 "ftrace: Optimize the function tracer list loop"
      Had an optimization done to all function tracing loops to optimize
      for a single registered ops. Unfortunately, this allows for a slight
      race when tracing starts or ends, where the stub function might be
      called after the current registered ops is removed. In this case we
      get the following dump:
      root# perf stat -e ftrace:function sleep 1
      [   74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
      [   74.349522] Hardware name: PRIMERGY RX200 S6
      [   74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
      r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
      bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
      [   74.446233] Pid: 1377, comm: perf Tainted: G        W    3.9.0-rc1 #1
      [   74.453458] Call Trace:
      [   74.456233]  [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
      [   74.462997]  [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
      [   74.470272]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.478117]  [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
      [   74.484681]  [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
      [   74.491760]  [<ffffffff8162f400>] ftrace_call+0x5/0x2f
      [   74.497511]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.503486]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.509500]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.516088]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.522268]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.528837]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.536696]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.542878]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.548869]  [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
      [   74.556243]  [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
      [   74.563709]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.569887]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.575898]  [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
      [   74.582505]  [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
      [   74.589298]  [<ffffffff811295d0>] free_event+0x70/0x1a0
      [   74.595208]  [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
      [   74.602460]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.608667]  [<ffffffff8112a640>] put_event+0x90/0xc0
      [   74.614373]  [<ffffffff8112a740>] perf_release+0x10/0x20
      [   74.620367]  [<ffffffff811a3044>] __fput+0xf4/0x280
      [   74.625894]  [<ffffffff811a31de>] ____fput+0xe/0x10
      [   74.631387]  [<ffffffff81083697>] task_work_run+0xa7/0xe0
      [   74.637452]  [<ffffffff81014981>] do_notify_resume+0x71/0xb0
      [   74.643843]  [<ffffffff8162fa92>] int_signal+0x12/0x17
      To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
      ftrace_ops stub as just that, a stub. This flag is now checked in the
      control loop and the function is not called if the flag is set.
      Thanks to Jovi for not just reporting the bug, but also pointing out
      where the bug was in the code.
      Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
      Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.com
      Tested-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Jan Kiszka's avatar
      ftrace: Consistently restore trace function on sysctl enabling · 5000c418
      Jan Kiszka authored
      If we reenable ftrace via syctl, we currently set ftrace_trace_function
      based on the previous simplistic algorithm. This is inconsistent with
      what update_ftrace_function does. So better call that helper instead.
      Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com
      Cc: stable@vger.kernel.org
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Chen Gang's avatar
      ftrace: Fix strncpy() use, use strlcpy() instead of strncpy() · 75761cc1
      Chen Gang authored
      For NUL terminated string we always need to set '\0' at the end.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: rostedt@goodmis.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/516243B7.9020405@asianux.com
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
  17. 15 Mar, 2013 3 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Use manual free after synchronize_sched() not call_rcu_sched() · 7818b388
      Steven Rostedt (Red Hat) authored
      The entries to the probe hash must be freed after a synchronize_sched()
      after the entry has been removed from the hash.
      As the entries are registered with ops that may have their own callbacks,
      and these callbacks may sleep, we can not use call_rcu_sched() because
      the rcu callbacks registered with that are called from a softirq context.
      Instead of using call_rcu_sched(), manually save the entries on a free_list
      and at the end of the loop that removes the entries, do a synchronize_sched()
      and then go through the free_list, freeing the entries.
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Clean up function probe methods · e67efb93
      Steven Rostedt (Red Hat) authored
      When a function probe is created, each function that the probe is
      attached to, a "callback" method is called. On release of the probe,
      each function entry calls the "free" method.
      First, "callback" is a confusing name and does not really match what
      it does. Callback sounds like it will be called when the probe
      triggers. But that's not the case. This is really an "init" function,
      so lets rename it as such.
      Secondly, both "init" and "free" do not pass enough information back
      to the handlers. Pass back the ops, ip and data for each time the
      method is called. We have the information, might as well use it.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function probe to only enable needed functions · e1df4cb6
      Steven Rostedt (Red Hat) authored
      Currently the function probe enables all functions and runs a "hash"
      against every function call to see if it should call a probe. This
      is extremely wasteful.
      Note, a probe is something like:
        echo schedule:traceoff > /debug/tracing/set_ftrace_filter
      When schedule is called, the probe will disable tracing. But currently,
      it has a call back for *all* functions, and checks to see if the
      called function is the probe that is needed.
      The probe function has been created before ftrace was rewritten to
      allow for more than one "op" to be registered by the function tracer.
      When probes were created, it couldn't limit the functions without also
      limiting normal function calls. But now we can, it's about time
      to update the probe code.
      Todo, have separate ops for different entries. That is, assign
      a ftrace_ops per probe, instead of one op for all probes. But
      as there's not many probes assigned, this may not be that urgent.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  18. 13 Mar, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix free of probe entry by calling call_rcu_sched() · 740466bc
      Steven Rostedt (Red Hat) authored
      Because function tracing is very invasive, and can even trace
      calls to rcu_read_lock(), RCU access in function tracing is done
      with preempt_disable_notrace(). This requires a synchronize_sched()
      for updates and not a synchronize_rcu().
      Function probes (traceon, traceoff, etc) must be freed after
      a synchronize_sched() after its entry has been removed from the
      hash. But call_rcu() is used. Fix this by using call_rcu_sched().
      Also fix the usage to use hlist_del_rcu() instead of hlist_del().
      Cc: stable@vger.kernel.org
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  19. 28 Feb, 2013 1 commit
    • Sasha Levin's avatar
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin authored
      I'm not sure why, but the hlist for each entry iterators were conceived
              list_for_each_entry(pos, head, member)
      The hlist ones were greedy and wanted an extra parameter:
              hlist_for_each_entry(tpos, pos, head, member)
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      Besides the semantic patch, there was some manual work required:
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      -T b;
          <+... when != b
      - b,
      c, d) S
      - b,
      c) S
      - b,
      c) S
      - b,
      c, d) S
      - b,
      c, d) S
      - b,
      c) S
      for_each_busy_worker(a, c,
      - b,
      d) S
      - b,
      c) S
      - b,
      c) S
      - b,
      c) S
      - b,
      c) S
      - b,
      c) S
      - b,
      c) S
      -(a, b)
      + sk_for_each_from(a) S
      - b,
      c, d) S
      - b,
      c) S
      - b,
      c, d, e) S
      - b,
      c) S
      - b,
      c) S
      - b,
      c, d) S
      - b,
      c) S
      - b,
      c, d) S
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      - b,
      c) S
      - b,
      c, d) S
      - b,
      c, d) S
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: default avatarPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  20. 19 Feb, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Call ftrace cleanup module notifier after all other notifiers · 8c189ea6
      Steven Rostedt (Red Hat) authored
      Commit: c1bf08ac
       "ftrace: Be first to run code modification on modules"
      changed ftrace module notifier's priority to INT_MAX in order to
      process the ftrace nops before anything else could touch them
      (namely kprobes). This was the correct thing to do.
      Unfortunately, the ftrace module notifier also contains the ftrace
      clean up code. As opposed to the set up code, this code should be
      run *after* all the module notifiers have run in case a module is doing
      correct clean-up and unregisters its ftrace hooks. Basically, ftrace
      needs to do clean up on module removal, as it needs to know about code
      being removed so that it doesn't try to modify that code. But after it
      removes the module from its records, if a ftrace user tries to remove
      a probe, that removal will fail due as the record of that code segment
      no longer exists.
      Nothing really bad happens if the probe removal is called after ftrace
      did the clean up, but the ftrace removal function will return an error.
      Correct code (such as kprobes) will produce a WARN_ON() if it fails
      to remove the probe. As people get annoyed by frivolous warnings, it's
      best to do the ftrace clean up after everything else.
      By splitting the ftrace_module_notifier into two notifiers, one that
      does the module load setup that is run at high priority, and the other
      that is called for module clean up that is run at low priority, the
      problem is solved.
      Cc: stable@vger.kernel.org
      Reported-by: default avatarFrank Ch. Eigler <fche@redhat.com>
      Acked-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  21. 23 Jan, 2013 3 commits
    • Steven Rostedt's avatar
      tracing: Avoid unnecessary multiple recursion checks · edc15caf
      Steven Rostedt authored
      When function tracing occurs, the following steps are made:
        If arch does not support a ftrace feature:
         call internal function (uses INTERNAL bits) which calls...
        If callback is registered to the "global" list, the list
         function is called and recursion checks the GLOBAL bits.
         then this function calls...
        The function callback, which can use the FTRACE bits to
         check for recursion.
      Now if the arch does not suppport a feature, and it calls
      the global list function which calls the ftrace callback
      all three of these steps will do a recursion protection.
      There's no reason to do one if the previous caller already
      did. The recursion that we are protecting against will
      go through the same steps again.
      To prevent the multiple recursion checks, if a recursion
      bit is set that is higher than the MAX bit of the current
      check, then we know that the check was made by the previous
      caller, and we can skip the current check.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      ftrace: Add context level recursion bit checking · c29f122c
      Steven Rostedt authored
      Currently for recursion checking in the function tracer, ftrace
      tests a task_struct bit to determine if the function tracer had
      recursed or not. If it has, then it will will return without going
      But this leads to races. If an interrupt came in after the bit
      was set, the functions being traced would see that bit set and
      think that the function tracer recursed on itself, and would return.
      Instead add a bit for each context (normal, softirq, irq and nmi).
      A check of which context the task is in is made before testing the
      associated bit. Now if an interrupt preempts the function tracer
      after the previous context has been set, the interrupt functions
      can still be traced.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt's avatar
      ftrace: Optimize the function tracer list loop · 0a016409
      Steven Rostedt authored
      There is lots of places that perform:
             op = rcu_dereference_raw(ftrace_control_list);
             while (op != &ftrace_list_end) {
      Add a helper macro to do this, and also optimize for a single
      entity. That is, gcc will optimize a loop for either no iterations
      or more than one iteration. But usually only a single callback
      is registered to the function tracer, thus the optimized case
      should be a single pass. to do this we now do:
      	op = rcu_dereference_raw(list);
      	do {
      	} while (likely(op = rcu_dereference_raw((op)->next)) &&
      	       unlikely((op) != &ftrace_list_end));
      An op is always registered (ftrace_list_end when no callbacks is
      registered), thus when a single callback is registered, the link
      list looks like:
       top => callback => ftrace_list_end => NULL.
      The likely(op = op->next) still must be performed due to the race
      of removing the callback, where the first op assignment could
      equal ftrace_list_end. In that case, the op->next would be NULL.
      But this is unlikely (only happens in a race condition when
      removing the callback).
      But it is very likely that the next op would be ftrace_list_end,
      unless more than one callback has been registered. This tells
      gcc what the most common case is and makes the fast path with
      the least amount of branches.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>