Skip to content
  • Frederic Weisbecker's avatar
    perf: Fetch hot regs from the template caller · 6cc8a7c1
    Frederic Weisbecker authored
    
    
    Trace events can be defined from a template using
    DECLARE_EVENT_CLASS/DEFINE_EVENT or directly with TRACE_EVENT.
    
    In both cases we have a template tracepoint handler, used to
    record the trace, to which we pass our ftrace event instance.
    
    In the function level, if the class is named "foo" and the event
    is named "blah", we have the following chain of calls:
    
    perf_trace_blah() -> perf_trace_templ_foo()
    
    In the case we have several events sharing the class "blah",
    we'll have multiple users of perf_trace_templ_foo(), and it
    won't be inlined by the compiler. This is usually what happens
    with the DECLARE_EVENT_CLASS/DEFINE_EVENT based definition.
    
    But if perf_trace_blah() is the only caller of perf_trace_templ_foo()
    there are fair chances that it will be inlined.
    
    The problem is that we fetch the regs from perf_trace_templ_foo()
    after we rewinded the frame pointer to the second caller, we want
    to reach the caller of perf_trace_blah() to get the right source
    of the event. And we do this by always assuming that
    perf_trace_templ_foo() is not inlined. But as shown above this
    is not always true. And if it is inlined we miss the first caller,
    losing the most important level of precision.
    
    We get:
    	    61.31%       ls  [kernel.kallsyms]  [k] do_softirq
                             |
                             --- do_softirq
                                 irq_exit
                                 do_IRQ
                                 common_interrupt
                                |
                                |--25.00%-- tty_buffer_request_room
    
    Instead of:
    	    61.31%       ls  [kernel.kallsyms]  [k] __do_softirq
                             |
                             --- __do_softirq
                                 do_softirq
                                 irq_exit
                                 do_IRQ
                                 common_interrupt
                                |
                                |--25.00%-- tty_buffer_request_room
    
    To fix this, we fetch the regs from perf_trace_blah() rather than
    perf_trace_templ_foo() so that we don't have to deal with inlining
    surprises.
    
    That also bring us the advantage of having the true source of the
    event even if we don't have frame pointers.
    
    Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Ingo Molnar <mingo@elte.hu>
    6cc8a7c1