1. 20 Feb, 2014 2 commits
  2. 13 Jan, 2014 2 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix synchronization location disabling and freeing ftrace_ops · a4c35ed2
      Steven Rostedt (Red Hat) authored
      The synchronization needed after ftrace_ops are unregistered must happen
      after the callback is disabled from becing called by functions.
      
      The current location happens after the function is being removed from the
      internal lists, but not after the function callbacks were disabled, leaving
      the functions susceptible of being called after their callbacks are freed.
      
      This affects perf and any externel users of function tracing (LTTng and
      SystemTap).
      
      Cc: stable@vger.kernel.org # 3.0+
      Fixes: cdbe61bf
      
       "ftrace: Allow dynamically allocated function tracers"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      a4c35ed2
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Have function graph only trace based on global_ops filters · 23a8e844
      Steven Rostedt (Red Hat) authored
      Doing some different tests, I discovered that function graph tracing, when
      filtered via the set_ftrace_filter and set_ftrace_notrace files, does
      not always keep with them if another function ftrace_ops is registered
      to trace functions.
      
      The reason is that function graph just happens to trace all functions
      that the function tracer enables. When there was only one user of
      function tracing, the function graph tracer did not need to worry about
      being called by functions that it did not want to trace. But now that there
      are other users, this becomes a problem.
      
      For example, one just needs to do the following:
      
       # cd /sys/kernel/debug/tracing
       # echo schedule > set_ftrace_filter
       # echo function_graph > current_tracer
       # cat trace
      [..]
       0)               |  schedule() {
       ------------------------------------------
       0)    <idle>-0    =>   rcu_pre-7
       ------------------------------------------
      
       0) ! 2980.314 us |  }
       0)               |  schedule() {
       ------------------------------------------
       0)   rcu_pre-7    =>    <idle>-0
       ------------------------------------------
      
       0) + 20.701 us   |  }
      
       # echo 1 > /proc/sys/kernel/stack_tracer_enabled
       # cat trace
      [..]
       1) + 20.825 us   |      }
       1) + 21.651 us   |    }
       1) + 30.924 us   |  } /* SyS_ioctl */
       1)               |  do_page_fault() {
       1)               |    __do_page_fault() {
       1)   0.274 us    |      down_read_trylock();
       1)   0.098 us    |      find_vma();
       1)               |      handle_mm_fault() {
       1)               |        _raw_spin_lock() {
       1)   0.102 us    |          preempt_count_add();
       1)   0.097 us    |          do_raw_spin_lock();
       1)   2.173 us    |        }
       1)               |        do_wp_page() {
       1)   0.079 us    |          vm_normal_page();
       1)   0.086 us    |          reuse_swap_page();
       1)   0.076 us    |          page_move_anon_rmap();
       1)               |          unlock_page() {
       1)   0.082 us    |            page_waitqueue();
       1)   0.086 us    |            __wake_up_bit();
       1)   1.801 us    |          }
       1)   0.075 us    |          ptep_set_access_flags();
       1)               |          _raw_spin_unlock() {
       1)   0.098 us    |            do_raw_spin_unlock();
       1)   0.105 us    |            preempt_count_sub();
       1)   1.884 us    |          }
       1)   9.149 us    |        }
       1) + 13.083 us   |      }
       1)   0.146 us    |      up_read();
      
      When the stack tracer was enabled, it enabled all functions to be traced, which
      now the function graph tracer also traces. This is a side effect that should
      not occur.
      
      To fix this a test is added when the function tracing is changed, as well as when
      the graph tracer is enabled, to see if anything other than the ftrace global_ops
      function tracer is enabled. If so, then the graph tracer calls a test trampoline
      that will look at the function that is being traced and compare it with the
      filters defined by the global_ops.
      
      As an optimization, if there's no other function tracers registered, or if
      the only registered function tracers also use the global ops, the function
      graph infrastructure will call the registered function graph callback directly
      and not go through the test trampoline.
      
      Cc: stable@vger.kernel.org # 3.3+
      Fixes: d2d45c7a
      
       "tracing: Have stack_tracer use a separate list of functions"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      23a8e844
  3. 10 Jan, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Synchronize setting function_trace_op with ftrace_trace_function · 405e1d83
      Steven Rostedt (Red Hat) authored
      
      
      ftrace_trace_function is a variable that holds what function will be called
      directly by the assembly code (mcount). If just a single function is
      registered and it handles recursion itself, then the assembly will call that
      function directly without any helper function. It also passes in the
      ftrace_op that was registered with the callback. The ftrace_op to send is
      stored in the function_trace_op variable.
      
      The ftrace_trace_function and function_trace_op needs to be coordinated such
      that the called callback wont be called with the wrong ftrace_op, otherwise
      bad things can happen if it expected a different op. Luckily, there's no
      callback that doesn't use the helper functions that requires this. But
      there soon will be and this needs to be fixed.
      
      Use a set_function_trace_op to store the ftrace_op to set the
      function_trace_op to when it is safe to do so (during the update function
      within the breakpoint or stop machine calls). Or if dynamic ftrace is not
      being used (static tracing) then we have to do a bit more synchronization
      when the ftrace_trace_function is set as that takes affect immediately
      (as oppose to dynamic ftrace doing it with the modification of the trampoline).
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      405e1d83
  4. 02 Jan, 2014 1 commit
  5. 16 Dec, 2013 1 commit
    • Miao Xie's avatar
      ftrace: Initialize the ftrace profiler for each possible cpu · c4602c1c
      Miao Xie authored
      Ftrace currently initializes only the online CPUs. This implementation has
      two problems:
      - If we online a CPU after we enable the function profile, and then run the
        test, we will lose the trace information on that CPU.
        Steps to reproduce:
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # cd <debugfs>/tracing/
        # echo <some function name> >> set_ftrace_filter
        # echo 1 > function_profile_enabled
        # echo 1 > /sys/devices/system/cpu/cpu1/online
        # run test
      - If we offline a CPU before we enable the function profile, we will not clear
        the trace information when we enable the function profile. It will trouble
        the users.
        Steps to reproduce:
        # cd <debugfs>/tracing/
        # echo <some function name> >> set_ftrace_filter
        # echo 1 > function_profile_enabled
        # run test
        # cat trace_stat/function*
        # echo 0 > /sys/devices/system/cpu/cpu1/online
        # echo 0 > function_profile_enabled
        # echo 1 > function_profile_enabled
        # cat trace_stat/function*
        # run test
        # cat trace_stat/function*
      
      So it is better that we initialize the ftrace profiler for each possible cpu
      every time we enable the function profile instead of just the online ones.
      
      Link: http://lkml.kernel.org/r/1387178401-10619-1-git-send-email-miaox@cn.fujitsu.com
      
      
      
      Cc: stable@vger.kernel.org # 2.6.31+
      Signed-off-by: default avatarMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      c4602c1c
  6. 26 Nov, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function graph with loading of modules · 8a56d776
      Steven Rostedt (Red Hat) authored
      Commit 8c4f3c3f
      
       "ftrace: Check module functions being traced on reload"
      fixed module loading and unloading with respect to function tracing, but
      it missed the function graph tracer. If you perform the following
      
       # cd /sys/kernel/debug/tracing
       # echo function_graph > current_tracer
       # modprobe nfsd
       # echo nop > current_tracer
      
      You'll get the following oops message:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
       Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
       CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
        0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
        ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
       Call Trace:
        [<ffffffff814fe193>] dump_stack+0x4f/0x7c
        [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
        [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
        [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
        [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
        [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
        [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
        [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
        [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
        [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
        [<ffffffff81122aed>] vfs_write+0xab/0xf6
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff81122dbd>] SyS_write+0x59/0x82
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
       ---[ end trace 940358030751eafb ]---
      
      The above mentioned commit didn't go far enough. Well, it covered the
      function tracer by adding checks in __register_ftrace_function(). The
      problem is that the function graph tracer circumvents that (for a slight
      efficiency gain when function graph trace is running with a function
      tracer. The gain was not worth this).
      
      The problem came with ftrace_startup() which should always be called after
      __register_ftrace_function(), if you want this bug to be completely fixed.
      
      Anyway, this solution moves __register_ftrace_function() inside of
      ftrace_startup() and removes the need to call them both.
      Reported-by: default avatarDave Wysochanski <dwysocha@redhat.com>
      Fixes: ed926f9b
      
       ("ftrace: Use counters to enable functions to trace")
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8a56d776
  7. 05 Nov, 2013 2 commits
    • Tom Zanussi's avatar
      tracing: Make register/unregister_ftrace_command __init · 38de93ab
      Tom Zanussi authored
      register/unregister_ftrace_command() are only ever called from __init
      functions, so can themselves be made __init.
      
      Also make register_snapshot_cmd() __init for the same reason.
      
      Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com
      
      Signed-off-by: default avatarTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      38de93ab
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Have control op function callback only trace when RCU is watching · b5aa3a47
      Steven Rostedt (Red Hat) authored
      
      
      Dave Jones reported that trinity would be able to trigger the following
      back trace:
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #38 Not tainted
       -------------------------------
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
      
       RCU used illegally from idle CPU!  rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       1 lock held by trinity-child1/18786:
        #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8113dd48>] __perf_event_overflow+0x108/0x310
       stack backtrace:
       CPU: 3 PID: 18786 Comm: trinity-child1 Not tainted 3.10.0-rc2+ #38
        0000000000000000 ffff88020767bac8 ffffffff816e2f6b ffff88020767baf8
        ffffffff810b5897 ffff88021de92520 0000000000000000 ffff88020767bbf8
        0000000000000000 ffff88020767bb78 ffffffff8113ded4 ffffffff8113dd48
       Call Trace:
        [<ffffffff816e2f6b>] dump_stack+0x19/0x1b
        [<ffffffff810b5897>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff8113ded4>] __perf_event_overflow+0x294/0x310
        [<ffffffff8113dd48>] ? __perf_event_overflow+0x108/0x310
        [<ffffffff81309289>] ? __const_udelay+0x29/0x30
        [<ffffffff81076054>] ? __rcu_read_unlock+0x54/0xa0
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff8113dfa1>] perf_swevent_overflow+0x51/0xe0
        [<ffffffff8113e08f>] perf_swevent_event+0x5f/0x90
        [<ffffffff8113e1c9>] perf_tp_event+0x109/0x4f0
        [<ffffffff8113e36f>] ? perf_tp_event+0x2af/0x4f0
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff8112d79f>] perf_ftrace_function_call+0xbf/0xd0
        [<ffffffff8110e1e1>] ? ftrace_ops_control_func+0x181/0x210
        [<ffffffff81074630>] ? __rcu_read_lock+0x20/0x20
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110e1e1>] ftrace_ops_control_func+0x181/0x210
        [<ffffffff816f4000>] ftrace_call+0x5/0x2f
        [<ffffffff8110e229>] ? ftrace_ops_control_func+0x1c9/0x210
        [<ffffffff816f4000>] ? ftrace_call+0x5/0x2f
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81074635>] ? debug_lockdep_rcu_enabled+0x5/0x40
        [<ffffffff81100cae>] ? rcu_eqs_enter_common+0x5e/0x470
        [<ffffffff8110112a>] rcu_eqs_enter+0x6a/0xb0
        [<ffffffff81103673>] rcu_user_enter+0x13/0x20
        [<ffffffff8114541a>] user_enter+0x6a/0xd0
        [<ffffffff8100f6d8>] syscall_trace_leave+0x78/0x140
        [<ffffffff816f46af>] int_check_syscall_exit_work+0x34/0x3d
       ------------[ cut here ]------------
      
      Perf uses rcu_read_lock() but as the function tracer can trace functions
      even when RCU is not currently active, this makes the rcu_read_lock()
      used by perf ineffective.
      
      As perf is currently the only user of the ftrace_ops_control_func() and
      perf is also the only function callback that actively uses rcu_read_lock(),
      the quick fix is to prevent the ftrace_ops_control_func() from calling
      its callbacks if RCU is not active.
      
      With Paul's new "rcu_is_watching()" we can tell if RCU is active or not.
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      b5aa3a47
  8. 19 Oct, 2013 4 commits
  9. 03 Sep, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix a slight race in modifying what function callback gets traced · 59338f75
      Steven Rostedt (Red Hat) authored
      
      
      There's a slight race when going from a list function to a non list
      function. That is, when only one callback is registered to the function
      tracer, it gets called directly by the mcount trampoline. But if this
      function has filters, it may be called by the wrong functions.
      
      As the list ops callback that handles multiple callbacks that are
      registered to ftrace, it also handles what functions they call. While
      the transaction is taking place, use the list function always, and
      after all the updates are finished (only the functions that should be
      traced are being traced), then we can update the trampoline to call
      the function directly.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      59338f75
  10. 31 Jul, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Check module functions being traced on reload · 8c4f3c3f
      Steven Rostedt (Red Hat) authored
      There's been a nasty bug that would show up and not give much info.
      The bug displayed the following warning:
      
       WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
       Pid: 20903, comm: bash Tainted: G           O 3.6.11+ #38405.trunk
       Call Trace:
        [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
        [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
        [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
        [<ffffffff811401cc>] ? kfree+0x2c/0x110
        [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
        [<ffffffff81149f1e>] __fput+0xae/0x220
        [<ffffffff8114a09e>] ____fput+0xe/0x10
        [<ffffffff8105fa22>] task_work_run+0x72/0x90
        [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
        [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
        [<ffffffff815c0f88>] int_signal+0x12/0x17
       ---[ end trace 793179526ee09b2c ]---
      
      It was finally narrowed down to unloading a module that was being traced.
      
      It was actually more than that. When functions are being traced, there's
      a table of all functions that have a ref count of the number of active
      tracers attached to that function. When a function trace callback is
      registered to a function, the function's record ref count is incremented.
      When it is unregistered, the function's record ref count is decremented.
      If an inconsistency is detected (ref count goes below zero) the above
      warning is shown and the function tracing is permanently disabled until
      reboot.
      
      The ftrace callback ops holds a hash of functions that it filters on
      (and/or filters off). If the hash is empty, the default means to filter
      all functions (for the filter_hash) or to disable no functions (for the
      notrace_hash).
      
      When a module is unloaded, it frees the function records that represent
      the module functions. These records exist on their own pages, that is
      function records for one module will not exist on the same page as
      function records for other modules or even the core kernel.
      
      Now when a module unloads, the records that represents its functions are
      freed. When the module is loaded again, the records are recreated with
      a default ref count of zero (unless there's a callback that traces all
      functions, then they will also be traced, and the ref count will be
      incremented).
      
      The problem is that if an ftrace callback hash includes functions of the
      module being unloaded, those hash entries will not be removed. If the
      module is reloaded in the same location, the hash entries still point
      to the functions of the module but the module's ref counts do not reflect
      that.
      
      With the help of Steve and Joern, we found a reproducer:
      
       Using uinput module and uinput_release function.
      
       cd /sys/kernel/debug/tracing
       modprobe uinput
       echo uinput_release > set_ftrace_filter
       echo function > current_tracer
       rmmod uinput
       modprobe uinput
       # check /proc/modules to see if loaded in same addr, otherwise try again
       echo nop > current_tracer
      
       [BOOM]
      
      The above loads the uinput module, which creates a table of functions that
      can be traced within the module.
      
      We add uinput_release to the filter_hash to trace just that function.
      
      Enable function tracincg, which increments the ref count of the record
      associated to uinput_release.
      
      Remove uinput, which frees the records including the one that represents
      uinput_release.
      
      Load the uinput module again (and make sure it's at the same address).
      This recreates the function records all with a ref count of zero,
      including uinput_release.
      
      Disable function tracing, which will decrement the ref count for uinput_release
      which is now zero because of the module removal and reload, and we have
      a mismatch (below zero ref count).
      
      The solution is to check all currently tracing ftrace callbacks to see if any
      are tracing any of the module's functions when a module is loaded (it already does
      that with callbacks that trace all functions). If a callback happens to have
      a module function being traced, it increments that records ref count and starts
      tracing that function.
      
      There may be a strange side effect with this, where tracing module functions
      on unload and then reloading a new module may have that new module's functions
      being traced. This may be something that confuses the user, but it's not
      a big deal. Another approach is to disable all callback hashes on module unload,
      but this leaves some ftrace callbacks that may not be registered, but can
      still have hashes tracing the module's function where ftrace doesn't know about
      it. That situation can cause the same bug. This solution solves that case too.
      Another benefit of this solution, is it is possible to trace a module's
      function on unload and load.
      
      Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.com
      
      Reported-by: default avatarJörn Engel <joern@logfs.org>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarSteve Hodgson <steve@purestorage.com>
      Tested-by: default avatarSteve Hodgson <steve@purestorage.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      8c4f3c3f
  11. 30 Jul, 2013 1 commit
  12. 24 Jul, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Add check for NULL regs if ops has SAVE_REGS set · 195a8afc
      Steven Rostedt (Red Hat) authored
      
      
      If a ftrace ops is registered with the SAVE_REGS flag set, and there's
      already a ops registered to one of its functions but without the
      SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets
      added to the list of callbacks to call for that function before the
      callback trampoline gets set to save the regs.
      
      The problem is, the function is not currently saving regs, which opens
      a small race window where the ops that is expecting regs to be passed
      to it, wont. This can cause a crash if the callback were to reference
      the regs, as the SAVE_REGS guarantees that regs will be set.
      
      To fix this, we add a check in the loop case where it checks if the ops
      has the SAVE_REGS flag set, and if so, it will ignore it if regs is
      not set.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      195a8afc
  13. 02 Jul, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Do not run selftest if command line parameter is set · f1ed7c74
      Steven Rostedt (Red Hat) authored
      
      
      If the kernel command line ftrace filter parameters are set
      (ftrace_filter or ftrace_notrace), force the function self test to
      pass, with a warning why it was forced.
      
      If the user adds a filter to the kernel command line, it is assumed
      that they know what they are doing, and the self test should just not
      run instead of failing (which disables function tracing) or clearing
      the filter, as that will probably annoy the user.
      
      If the user wants the selftest to run, the message will tell them why
      it did not.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f1ed7c74
  14. 20 Jun, 2013 1 commit
  15. 11 Jun, 2013 1 commit
    • Steven Rostedt's avatar
      ftrace: Use schedule_on_each_cpu() as a heavy synchronize_sched() · 7614c3dc
      Steven Rostedt authored
      The function tracer uses preempt_disable/enable_notrace() for
      synchronization between reading registered ftrace_ops and unregistering
      them.
      
      Most of the ftrace_ops are global permanent structures that do not
      require this synchronization. That is, ops may be added and removed from
      the hlist but are never freed, and wont hurt if a synchronization is
      missed.
      
      But this is not true for dynamically created ftrace_ops or control_ops,
      which are used by the perf function tracing.
      
      The problem here is that the function tracer can be used to trace
      kernel/user context switches as well as going to and from idle.
      Basically, it can be used to trace blind spots of the RCU subsystem.
      This means that even though preempt_disable() is done, a
      synchronize_sched() will ignore CPUs that haven't made it out of user
      space or idle. These can include functions that are being traced just
      before entering or exiting the kernel sections.
      
      To implement the RCU synchronization, instead of using
      synchronize_sched() the use of schedule_on_each_cpu() is performed. This
      means that when a dynamically allocated ftrace_ops, or a control ops is
      being unregistered, all CPUs must be touched and execute a ftrace_sync()
      stub function via the work queues. This will rip CPUs out from idle or
      in dynamic tick mode. This only happens when a user disables perf
      function tracing or other dynamically allocated function tracers, but it
      allows us to continue to debug RCU and context tracking with function
      tracing.
      
      Link: http://lkml.kernel.org/r/1369785676.15552.55.camel@gandalf.local.home
      
      
      
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7614c3dc
  16. 29 May, 2013 1 commit
  17. 10 May, 2013 5 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function probe when more than one probe is added · 19dd603e
      Steven Rostedt (Red Hat) authored
      
      
      When the first function probe is added and the function tracer
      is updated the functions are modified to call the probe.
      But when a second function is added, it updates the function
      records to have the second function also update, but it fails
      to update the actual function itself.
      
      This prevents the second (or third or forth and so on) probes
      from having their functions called.
      
        # echo vfs_symlink:enable_event:sched:sched_switch > set_ftrace_filter
        # echo vfs_unlink:enable_event:sched:sched_switch > set_ftrace_filter
        # cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 0/0   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
        # touch /tmp/a
        # rm /tmp/a
        # cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 0/0   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
        # ln -s /tmp/a
        # cat trace
       # tracer: nop
       #
       # entries-in-buffer/entries-written: 414/414   #P:4
       #
       #                              _-----=> irqs-off
       #                             / _----=> need-resched
       #                            | / _---=> hardirq/softirq
       #                            || / _--=> preempt-depth
       #                            ||| /     delay
       #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
       #              | |       |   ||||       |         |
                 <idle>-0     [000] d..3  2847.923031: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=2786 next_prio=120
                  <...>-3114  [001] d..4  2847.923035: sched_switch: prev_comm=ln prev_pid=3114 prev_prio=120 prev_state=x ==> next_comm=swapper/1 next_pid=0 next_prio=120
                   bash-2786  [000] d..3  2847.923535: sched_switch: prev_comm=bash prev_pid=2786 prev_prio=120 prev_state=S ==> next_comm=kworker/0:1 next_pid=34 next_prio=120
            kworker/0:1-34    [000] d..3  2847.923552: sched_switch: prev_comm=kworker/0:1 prev_pid=34 prev_prio=120 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120
                 <idle>-0     [002] d..3  2847.923554: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=sshd next_pid=2783 next_prio=120
                   sshd-2783  [002] d..3  2847.923660: sched_switch: prev_comm=sshd prev_pid=2783 prev_prio=120 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
      
      Still need to update the functions even though the probe itself
      does not need to be registered again when added a new probe.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      19dd603e
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix the output of enabled_functions debug file · 23ea9c4d
      Steven Rostedt (Red Hat) authored
      
      
      The enabled_functions debugfs file was created to be able to see
      what functions have been modified from nops to calling a tracer.
      
      The current method uses the counter in the function record.
      As when a ftrace_ops is registered to a function, its count
      increases. But that doesn't mean that the function is actively
      being traced. /proc/sys/kernel/ftrace_enabled can be set to zero
      which would disable it, as well as something can go wrong and
      we can think its enabled when only the counter is set.
      
      The record's FTRACE_FL_ENABLED flag is set or cleared when its
      function is modified. That is a much more accurate way of knowing
      what function is enabled or not.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      23ea9c4d
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix locking in register_ftrace_function_probe() · 5ae0bf59
      Steven Rostedt (Red Hat) authored
      
      
      The iteration of the ftrace function list and the call to
      ftrace_match_record() need to be protected by the ftrace_lock.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5ae0bf59
    • Masami Hiramatsu's avatar
      ftrace: Cleanup regex_lock and ftrace_lock around hash updating · 3f2367ba
      Masami Hiramatsu authored
      Cleanup regex_lock and ftrace_lock locking points around
      ftrace_ops hash update code.
      
      The new rule is that regex_lock protects ops->*_hash
      read-update-write code for each ftrace_ops. Usually,
      hash update is done by following sequence.
      
      1. allocate a new local hash and copy the original hash.
      2. update the local hash.
      3. move(actually, copy) back the local hash to ftrace_ops.
      4. update ftrace entries if needed.
      5. release the local hash.
      
      This makes regex_lock protect #1-#4, and ftrace_lock
      to protect #3, #4 and adding and removing ftrace_ops from the
      ftrace_ops_list. The ftrace_lock protects #3 as well because
      the move functions update the entries too.
      
      Link: http://lkml.kernel.org/r/20130509054421.30398.83411.stgit@mhiramat-M0-7522
      
      
      
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      3f2367ba
    • Masami Hiramatsu's avatar
      ftrace, kprobes: Fix a deadlock on ftrace_regex_lock · f04f24fb
      Masami Hiramatsu authored
      Fix a deadlock on ftrace_regex_lock which happens when setting
      an enable_event trigger on dynamic kprobe event as below.
      
      ----
      sh-2.05b# echo p vfs_symlink > kprobe_events
      sh-2.05b# echo vfs_symlink:enable_event:kprobes:p_vfs_symlink_0 > set_ftrace_filter
      
      =============================================
      [ INFO: possible recursive locking detected ]
      3.9.0+ #35 Not tainted
      ---------------------------------------------
      sh/72 is trying to acquire lock:
       (ftrace_regex_lock){+.+.+.}, at: [<ffffffff810ba6c1>] ftrace_set_hash+0x81/0x1f0
      
      but task is already holding lock:
       (ftrace_regex_lock){+.+.+.}, at: [<ffffffff810b7cbd>] ftrace_regex_write.isra.29.part.30+0x3d/0x220
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(ftrace_regex_lock);
        lock(ftrace_regex_lock);
      
       *** DEADLOCK ***
      ----
      
      To fix that, this introduces a finer regex_lock for each ftrace_ops.
      ftrace_regex_lock is too big of a lock which protects all
      filter/notrace_hash operations, but it doesn't need to be a global
      lock after supporting multiple ftrace_ops because each ftrace_ops
      has its own filter/notrace_hash.
      
      Link: http://lkml.kernel.org/r/20130509054417.30398.84254.stgit@mhiramat-M0-7522
      
      
      
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tom Zanussi <tom.zanussi@intel.com>
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      [ Added initialization flag and automate mutex initialization for
        non ftrace.c ftrace_probes. ]
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      f04f24fb
  18. 09 May, 2013 1 commit
  19. 13 Apr, 2013 3 commits
  20. 12 Apr, 2013 2 commits
  21. 09 Apr, 2013 3 commits
  22. 08 Apr, 2013 3 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Do not call stub functions in control loop · 395b97a3
      Steven Rostedt (Red Hat) authored
      The function tracing control loop used by perf spits out a warning
      if the called function is not a control function. This is because
      the control function references a per cpu allocated data structure
      on struct ftrace_ops that is not allocated for other types of
      functions.
      
      commit 0a016409 "ftrace: Optimize the function tracer list loop"
      
      Had an optimization done to all function tracing loops to optimize
      for a single registered ops. Unfortunately, this allows for a slight
      race when tracing starts or ends, where the stub function might be
      called after the current registered ops is removed. In this case we
      get the following dump:
      
      root# perf stat -e ftrace:function sleep 1
      [   74.339105] WARNING: at include/linux/ftrace.h:209 ftrace_ops_control_func+0xde/0xf0()
      [   74.349522] Hardware name: PRIMERGY RX200 S6
      [   74.357149] Modules linked in: sg igb iTCO_wdt ptp pps_core iTCO_vendor_support i7core_edac dca lpc_ich i2c_i801 coretemp edac_core crc32c_intel mfd_core ghash_clmulni_intel dm_multipath acpi_power_meter pcspk
      r microcode vhost_net tun macvtap macvlan nfsd kvm_intel kvm auth_rpcgss nfs_acl lockd sunrpc uinput xfs libcrc32c sd_mod crc_t10dif sr_mod cdrom mgag200 i2c_algo_bit drm_kms_helper ttm qla2xxx mptsas ahci drm li
      bahci scsi_transport_sas mptscsih libata scsi_transport_fc i2c_core mptbase scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
      [   74.446233] Pid: 1377, comm: perf Tainted: G        W    3.9.0-rc1 #1
      [   74.453458] Call Trace:
      [   74.456233]  [<ffffffff81062e3f>] warn_slowpath_common+0x7f/0xc0
      [   74.462997]  [<ffffffff810fbc60>] ? rcu_note_context_switch+0xa0/0xa0
      [   74.470272]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.478117]  [<ffffffff81062e9a>] warn_slowpath_null+0x1a/0x20
      [   74.484681]  [<ffffffff81102ede>] ftrace_ops_control_func+0xde/0xf0
      [   74.491760]  [<ffffffff8162f400>] ftrace_call+0x5/0x2f
      [   74.497511]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.503486]  [<ffffffff8162f400>] ? ftrace_call+0x5/0x2f
      [   74.509500]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.516088]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.522268]  [<ffffffff810fbc65>] ? synchronize_sched+0x5/0x50
      [   74.528837]  [<ffffffff811041a2>] ? __unregister_ftrace_function+0xa2/0x1a0
      [   74.536696]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.542878]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.548869]  [<ffffffff81105c67>] unregister_ftrace_function+0x27/0x50
      [   74.556243]  [<ffffffff8111eadf>] perf_ftrace_event_register+0x9f/0x140
      [   74.563709]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.569887]  [<ffffffff8162402d>] ? mutex_lock+0x1d/0x50
      [   74.575898]  [<ffffffff8111e94e>] perf_trace_destroy+0x2e/0x50
      [   74.582505]  [<ffffffff81127ba9>] tp_perf_event_destroy+0x9/0x10
      [   74.589298]  [<ffffffff811295d0>] free_event+0x70/0x1a0
      [   74.595208]  [<ffffffff8112a579>] perf_event_release_kernel+0x69/0xa0
      [   74.602460]  [<ffffffff816254d5>] ? _cond_resched+0x5/0x40
      [   74.608667]  [<ffffffff8112a640>] put_event+0x90/0xc0
      [   74.614373]  [<ffffffff8112a740>] perf_release+0x10/0x20
      [   74.620367]  [<ffffffff811a3044>] __fput+0xf4/0x280
      [   74.625894]  [<ffffffff811a31de>] ____fput+0xe/0x10
      [   74.631387]  [<ffffffff81083697>] task_work_run+0xa7/0xe0
      [   74.637452]  [<ffffffff81014981>] do_notify_resume+0x71/0xb0
      [   74.643843]  [<ffffffff8162fa92>] int_signal+0x12/0x17
      
      To fix this a new ftrace_ops flag is added that denotes the ftrace_list_end
      ftrace_ops stub as just that, a stub. This flag is now checked in the
      control loop and the function is not called if the flag is set.
      
      Thanks to Jovi for not just reporting the bug, but also pointing out
      where the bug was in the code.
      
      Link: http://lkml.kernel.org/r/514A8855.7090402@redhat.com
      Link: http://lkml.kernel.org/r/1364377499-1900-15-git-send-email-jovi.zhangwei@huawei.com
      
      Tested-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarWANG Chao <chaowang@redhat.com>
      Reported-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      395b97a3
    • Jan Kiszka's avatar
      ftrace: Consistently restore trace function on sysctl enabling · 5000c418
      Jan Kiszka authored
      If we reenable ftrace via syctl, we currently set ftrace_trace_function
      based on the previous simplistic algorithm. This is inconsistent with
      what update_ftrace_function does. So better call that helper instead.
      
      Link: http://lkml.kernel.org/r/5151D26F.1070702@siemens.com
      
      
      
      Cc: stable@vger.kernel.org
      Signed-off-by: Jan Kiszka's avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5000c418
    • Chen Gang's avatar
      ftrace: Fix strncpy() use, use strlcpy() instead of strncpy() · 75761cc1
      Chen Gang authored
      
      
      For NUL terminated string we always need to set '\0' at the end.
      Signed-off-by: default avatarChen Gang <gang.chen@asianux.com>
      Cc: rostedt@goodmis.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/516243B7.9020405@asianux.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      75761cc1
  23. 15 Mar, 2013 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Use manual free after synchronize_sched() not call_rcu_sched() · 7818b388
      Steven Rostedt (Red Hat) authored
      
      
      The entries to the probe hash must be freed after a synchronize_sched()
      after the entry has been removed from the hash.
      
      As the entries are registered with ops that may have their own callbacks,
      and these callbacks may sleep, we can not use call_rcu_sched() because
      the rcu callbacks registered with that are called from a softirq context.
      
      Instead of using call_rcu_sched(), manually save the entries on a free_list
      and at the end of the loop that removes the entries, do a synchronize_sched()
      and then go through the free_list, freeing the entries.
      
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      7818b388