Commit 6cc8a7c1 authored by Frederic Weisbecker's avatar Frederic Weisbecker
Browse files

perf: Fetch hot regs from the template caller

Trace events can be defined from a template using

In both cases we have a template tracepoint handler, used to
record the trace, to which we pass our ftrace event instance.

In the function level, if the class is named "foo" and the event
is named "blah", we have the following chain of calls:

perf_trace_blah() -> perf_trace_templ_foo()

In the case we have several events sharing the class "blah",
we'll have multiple users of perf_trace_templ_foo(), and it
won't be inlined by the compiler. This is usually what happens
with the DECLARE_EVENT_CLASS/DEFINE_EVENT based definition.

But if perf_trace_blah() is the only caller of perf_trace_templ_foo()
there are fair chances that it will be inlined.

The problem is that we fetch the regs from perf_trace_templ_foo()
after we rewinded the frame pointer to the second caller, we want
to reach the caller of perf_trace_blah() to get the right source
of the event. And we do this by always assuming that
perf_trace_templ_foo() is not inlined. But as shown above this
is not always true. And if it is inlined we miss the first caller,
losing the most important level of precision.

We get:
	    61.31%       ls  [kernel.kallsyms]  [k] do_softirq
                         --- do_softirq
                            |--25.00%-- tty_buffer_request_room

Instead of:
	    61.31%       ls  [kernel.kallsyms]  [k] __do_softirq
                         --- __do_softirq
                            |--25.00%-- tty_buffer_request_room

To fix this, we fetch the regs from perf_trace_blah() rather than
perf_trace_templ_foo() so that we don't have to deal with inlining

That also bring us the advantage of having the true source of the
event even if we don't have frame pointers.
Signed-off-by: default avatarFrederic Weisbecker <>
Cc: Peter Zijlstra <>
Cc: Arnaldo Carvalho de Melo <>
Cc: Paul Mackerras <>
Cc: Ingo Molnar <>
parent 6f4dee06
......@@ -758,13 +758,12 @@ __attribute__((section("_ftrace_events"))) event_##call = { \
#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
static notrace void \
perf_trace_templ_##call(struct ftrace_event_call *event_call, \
proto) \
struct pt_regs *__regs, proto) \
{ \
struct ftrace_data_offsets_##call __maybe_unused __data_offsets;\
struct ftrace_raw_##call *entry; \
u64 __addr = 0, __count = 1; \
unsigned long irq_flags; \
struct pt_regs *__regs; \
int __entry_size; \
int __data_size; \
int rctx; \
......@@ -785,9 +784,6 @@ perf_trace_templ_##call(struct ftrace_event_call *event_call, \
{ assign; } \
__regs = &__get_cpu_var(perf_trace_regs); \
perf_fetch_caller_regs(__regs, 2); \
perf_trace_buf_submit(entry, __entry_size, rctx, __addr, \
__count, irq_flags, __regs); \
......@@ -797,8 +793,13 @@ perf_trace_templ_##call(struct ftrace_event_call *event_call, \
static notrace void perf_trace_##call(proto) \
{ \
struct ftrace_event_call *event_call = &event_##call; \
struct pt_regs *__regs = &get_cpu_var(perf_trace_regs); \
perf_fetch_caller_regs(__regs, 1); \
perf_trace_templ_##template(event_call, __regs, args); \
perf_trace_templ_##template(event_call, args); \
put_cpu_var(perf_trace_regs); \
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment