1. 22 Aug, 2014 3 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix up trampoline accounting with looping on hash ops · bce0b6c5
      Steven Rostedt (Red Hat) authored
      Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
      the rec->flags by more than once (one per those that share the ftrace_hash).
      This means that the tramp_hash may not have a hash item when it was added.
      For example, if two ftrace_ops share a hash for a ftrace record, and the
      first ops has a trampoline, when it adds itself it will set the rec->flags
      TRAMP flag and increments its nr_trampolines counter. When the second ops
      is added, it must clear that tramp flag but also decrement the other ops
      that shares its hash. As the update to the function callbacks has not yet
      been performed, the other ops will not have the tramp hash set yet and it
      can not be used to know to decrement its nr_trampolines.
      Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
      held, a ops with a trampoline to a record during an update of another ops
      that shares the record will have its func_hash pointing to it. Since a
      trampoline can only be set for a record if only one ops is attached to it,
      we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
      is set) and then find the ops that has this record in its hashes.
      Also added some output to help debug when things go wrong.
      Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Update all ftrace_ops for a ftrace_hash_ops update · 84261912
      Steven Rostedt (Red Hat) authored
      When updating what an ftrace_ops traces, if it is registered (that is,
      actively tracing), and that ftrace_ops uses the shared global_ops
      local_hash, then we need to update all tracers that are active and
      also share the global_ops' ftrace_hash_ops.
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Allow ftrace_ops to use the hashes from other ops · 33b7f99c
      Steven Rostedt (Red Hat) authored
      Currently the top level debug file system function tracer shares its
      ftrace_ops with the function graph tracer. This was thought to be fine
      because the tracers are not used together, as one can only enable
      function or function_graph tracer in the current_tracer file.
      But that assumption proved to be incorrect. The function profiler
      can use the function graph tracer when function tracing is enabled.
      Since all function graph users uses the function tracing ftrace_ops
      this causes a conflict and when a user enables both function profiling
      as well as the function tracer it will crash ftrace and disable it.
      The quick solution so far is to move them as separate ftrace_ops like
      it was earlier. The problem though is to synchronize the functions that
      are traced because both function and function_graph tracer are limited
      by the selections made in the set_ftrace_filter and set_ftrace_notrace
      To handle this, a new structure is made called ftrace_ops_hash. This
      structure will now hold the filter_hash and notrace_hash, and the
      ftrace_ops will point to this structure. That will allow two ftrace_ops
      to share the same hashes.
      Since most ftrace_ops do not share the hashes, and to keep allocation
      simple, the ftrace_ops structure will include both a pointer to the
      ftrace_ops_hash called func_hash, as well as the structure itself,
      called local_hash. When the ops are registered, the func_hash pointer
      will be initialized to point to the local_hash within the ftrace_ops
      structure. Some of the ftrace internal ftrace_ops will be initialized
      statically. This will allow for the function and function_graph tracer
      to have separate ops but still share the same hash tables that determine
      what functions they trace.
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  2. 06 Aug, 2014 2 commits
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Always reset iterator to reader page · 651e22f2
      Steven Rostedt (Red Hat) authored
      When performing a consuming read, the ring buffer swaps out a
      page from the ring buffer with a empty page and this page that
      was swapped out becomes the new reader page. The reader page
      is owned by the reader and since it was swapped out of the ring
      buffer, writers do not have access to it (there's an exception
      to that rule, but it's out of scope for this commit).
      When reading the "trace" file, it is a non consuming read, which
      means that the data in the ring buffer will not be modified.
      When the trace file is opened, a ring buffer iterator is allocated
      and writes to the ring buffer are disabled, such that the iterator
      will not have issues iterating over the data.
      Although the ring buffer disabled writes, it does not disable other
      reads, or even consuming reads. If a consuming read happens, then
      the iterator is reset and starts reading from the beginning again.
      My tests would sometimes trigger this bug on my i386 box:
      WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
      Modules linked in:
      CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
       f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
       ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
      Call Trace:
       [<c18796b3>] dump_stack+0x4b/0x75
       [<c103a0e3>] warn_slowpath_common+0x7e/0x95
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
       [<c103a185>] warn_slowpath_fmt+0x33/0x35
       [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
       [<c10bed04>] trace_find_cmdline+0x40/0x64
       [<c10c3c16>] trace_print_context+0x27/0xec
       [<c10c4360>] ? trace_seq_printf+0x37/0x5b
       [<c10c0b15>] print_trace_line+0x319/0x39b
       [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
       [<c10c13b1>] s_show+0x192/0x1ab
       [<c10bfd9a>] ? s_next+0x5a/0x7c
       [<c112e76e>] seq_read+0x267/0x34c
       [<c1115a25>] vfs_read+0x8c/0xef
       [<c112e507>] ? seq_lseek+0x154/0x154
       [<c1115ba2>] SyS_read+0x54/0x7f
       [<c188488e>] syscall_call+0x7/0xb
      ---[ end trace 3f507febd6b4cc83 ]---
      >>>> ##### CPU 1 buffer started ####
      Which was the __trace_find_cmdline() function complaining about the pid
      in the event record being negative.
      After adding more test cases, this would trigger more often. Strangely
      enough, it would never trigger on a single test, but instead would trigger
      only when running all the tests. I believe that was the case because it
      required one of the tests to be shutting down via delayed instances while
      a new test started up.
      After spending several days debugging this, I found that it was caused by
      the iterator becoming corrupted. Debugging further, I found out why
      the iterator became corrupted. It happened with the rb_iter_reset().
      As consuming reads may not read the full reader page, and only part
      of it, there's a "read" field to know where the last read took place.
      The iterator, must also start at the read position. In the rb_iter_reset()
      code, if the reader page was disconnected from the ring buffer, the iterator
      would start at the head page within the ring buffer (where writes still
      happen). But the mistake there was that it still used the "read" field
      to start the iterator on the head page, where it should always start
      at zero because readers never read from within the ring buffer where
      writes occur.
      I originally wrote a patch to have it set the iter->head to 0 instead
      of iter->head_page->read, but then I questioned why it wasn't always
      setting the iter to point to the reader page, as the reader page is
      still valid.  The list_empty(reader_page->list) just means that it was
      successful in swapping out. But the reader_page may still have data.
      There was a bug report a long time ago that was not reproducible that
      had something about trace_pipe (consuming read) not matching trace
      (iterator read). This may explain why that happened.
      Anyway, the correct answer to this bug is to always use the reader page
      an not reset the iterator to inside the writable ring buffer.
      Cc: stable@vger.kernel.org # 2.6.28+
      Fixes: d769041f
       "ring_buffer: implement new locking"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ring-buffer: Up rb_iter_peek() loop count to 3 · 021de3d9
      Steven Rostedt (Red Hat) authored
      After writting a test to try to trigger the bug that caused the
      ring buffer iterator to become corrupted, I hit another bug:
       WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
       Modules linked in: ipt_MASQUERADE sunrpc [...]
       CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
        ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
        ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
       Call Trace:
        [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
        [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
        [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
        [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
        [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
        [<ffffffff8114cf94>] ? seq_read+0x148/0x361
        [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
        [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
        [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2
      Debugging this bug, which triggers when the rb_iter_peek() loops too
      many times (more than 2 times), I discovered there's a case that can
      cause that function to legitimately loop 3 times!
      rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
      only deals with the reader page (it's for consuming reads). The
      rb_iter_peek() is for traversing the buffer without consuming it, and as
      such, it can loop for one more reason. That is, if we hit the end of
      the reader page or any page, it will go to the next page and try again.
      That is, we have this:
       1. iter->head > iter->head_page->page->commit
          (rb_inc_iter() which moves the iter to the next page)
          try again
       2. event = rb_iter_head_event()
          event->type_len == RINGBUF_TYPE_TIME_EXTEND
          try again
       3. read the event.
      But we never get to 3, because the count is greater than 2 and we
      cause the WARNING and return NULL.
      Up the counter to 3.
      Cc: stable@vger.kernel.org # 2.6.37+
      Fixes: 69d1b839
       "ring-buffer: Bind time extend and data events together"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  3. 28 Jul, 2014 1 commit
  4. 24 Jul, 2014 2 commits
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Add warning if tramp hash does not match nr_trampolines · dc6f03f2
      Steven Rostedt (Red Hat) authored
      After adding all the records to the tramp_hash, add a check that makes
      sure that the number of records added matches the number of records
      expected to match and do a WARN_ON and disable ftrace if they do
      not match.
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix trampoline hash update check on rec->flags · 2a0343ba
      Steven Rostedt (Red Hat) authored
      In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs
      to the ops hash if the rec has only one callback attached and the
      ops is connected to the rec. It gives a nasty warning and shuts down
      ftrace if the rec doesn't have a trampoline set for it. But this
      can happen with the following scenario:
        # cd /sys/kernel/debug/tracing
        # echo schedule do_IRQ > set_ftrace_filter
        # mkdir instances/foo
        # echo schedule > instances/foo/set_ftrace_filter
        # echo function_graph > current_function
        # echo function > instances/foo/current_function
        # echo nop > instances/foo/current_function
      The above would then trigger the following warning and disable
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b()
       Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...]
       CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000
        ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001
        ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be
       Call Trace:
        [<ffffffff81502130>] ? dump_stack+0x4a/0x75
        [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
        [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
        [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b
        [<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b
        [<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38
        [<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28
        [<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276
        [<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91
        [<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc
        [<ffffffff8120478f>] ? security_file_permission+0x1b/0x31
        [<ffffffff81130e49>] ? vfs_write+0xac/0x11c
        [<ffffffff8113115d>] ? SyS_write+0x60/0x8e
        [<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b
       ---[ end trace 938c4415cbc7dc96 ]---
       ------------[ cut here ]------------
      Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com
      Reported-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  5. 23 Jul, 2014 3 commits
  6. 21 Jul, 2014 1 commit
  7. 19 Jul, 2014 1 commit
  8. 18 Jul, 2014 7 commits
  9. 17 Jul, 2014 1 commit
    • Steven Rostedt (Red Hat)'s avatar
      ftrace-graph: Remove dependency of ftrace_stop() from ftrace_graph_stop() · 1b2f121c
      Steven Rostedt (Red Hat) authored
      ftrace_stop() is going away as it disables parts of function tracing
      that affects users that should not be affected. But ftrace_graph_stop()
      is built on ftrace_stop(). Here's another example of killing all of
      function tracing because something went wrong with function graph
      Instead of disabling all users of function tracing on function graph
      error, disable only function graph tracing.
      A new function is created called ftrace_graph_is_dead(). This is called
      in strategic paths to prevent function graph from doing more harm and
      allowing at least a warning to be printed before the system crashes.
      NOTE: ftrace_stop() is still used until all the archs are converted over
      to use ftrace_graph_is_dead(). After that, ftrace_stop() will be removed.
      Reviewed-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  10. 16 Jul, 2014 7 commits
  11. 15 Jul, 2014 4 commits
    • Martin Lau's avatar
      ring-buffer: Fix polling on trace_pipe · 97b8ee84
      Martin Lau authored
      ring_buffer_poll_wait() should always put the poll_table to its wait_queue
      even there is immediate data available.  Otherwise, the following epoll and
      read sequence will eventually hang forever:
      1. Put some data to make the trace_pipe ring_buffer read ready first
      2. epoll_ctl(efd, EPOLL_CTL_ADD, trace_pipe_fd, ee)
      3. epoll_wait()
      4. read(trace_pipe_fd) till EAGAIN
      5. Add some more data to the trace_pipe ring_buffer
      6. epoll_wait() -> this epoll_wait() will block forever
      ~ During the epoll_ctl(efd, EPOLL_CTL_ADD,...) call in step 2,
        ring_buffer_poll_wait() returns immediately without adding poll_table,
        which has poll_table->_qproc pointing to ep_poll_callback(), to its
      ~ During the epoll_wait() call in step 3 and step 6,
        ring_buffer_poll_wait() cannot add ep_poll_callback() to its wait_queue
        because the poll_table->_qproc is NULL and it is how epoll works.
      ~ When there is new data available in step 6, ring_buffer does not know
        it has to call ep_poll_callback() because it is not in its wait queue.
        Hence, block forever.
      Other poll implementation seems to call poll_wait() unconditionally as the very
      first thing to do.  For example, tcp_poll() in tcp.c.
      Link: http://lkml.kernel.org/p/20140610060637.GA14045@devbig242.prn2.facebook.com
      Cc: stable@vger.kernel.org # 2.6.27
      Fixes: 2a2cc8f7
       "ftrace: allow the event pipe to be polled"
      Reviewed-by: default avatarChris Mason <clm@fb.com>
      Signed-off-by: default avatarMartin Lau <kafai@fb.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • zhangwei(Jovi)'s avatar
      tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs · f0160a5a
      zhangwei(Jovi) authored
      The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
      so add it, to be consistent with __trace_printk/__trace_bprintk.
      Those functions are all called by the same function: trace_printk().
      Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • Steven Rostedt (Red Hat)'s avatar
      tracing: Fix graph tracer with stack tracer on other archs · 5f8bf2d2
      Steven Rostedt (Red Hat) authored
      Running my ftrace tests on PowerPC, it failed the test that checks
      if function_graph tracer is affected by the stack tracer. It was.
      Looking into this, I found that the update_function_graph_func()
      must be called even if the trampoline function is not changed.
      This is because archs like PowerPC do not support ftrace_ops being
      passed by assembly and instead uses a helper function (what the
      trampoline function points to). Since this function is not changed
      even when multiple ftrace_ops are added to the code, the test that
      falls out before calling update_function_graph_func() will miss that
      the update must still be done.
      Call update_function_graph_function() for all calls to
      Cc: stable@vger.kernel.org # 3.3+
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    • zhangwei(Jovi)'s avatar
      tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs · 8abfb872
      zhangwei(Jovi) authored
      Currently trace option stacktrace is not applicable for
      trace_printk with constant string argument, the reason is
      in __trace_puts/__trace_bputs ftrace_trace_stack is missing.
      In contrast, when using trace_printk with non constant string
      argument(will call into __trace_printk/__trace_bprintk), then
      trace option stacktrace is workable, this inconstant result
      will confuses users a lot.
      Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: default avatarzhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
  12. 14 Jul, 2014 1 commit
  13. 01 Jul, 2014 7 commits