Skip to content
  • Steven Rostedt (Red Hat)'s avatar
    ftrace: Optimize function graph to be called directly · 79922b80
    Steven Rostedt (Red Hat) authored
    
    
    Function graph tracing is a bit different than the function tracers, as
    it is processed after either the ftrace_caller or ftrace_regs_caller
    and we only have one place to modify the jump to ftrace_graph_caller,
    the jump needs to happen after the restore of registeres.
    
    The function graph tracer is dependent on the function tracer, where
    even if the function graph tracing is going on by itself, the save and
    restore of registers is still done for function tracing regardless of
    if function tracing is happening, before it calls the function graph
    code.
    
    If there's no function tracing happening, it is possible to just call
    the function graph tracer directly, and avoid the wasted effort to save
    and restore regs for function tracing.
    
    This requires adding new flags to the dyn_ftrace records:
    
      FTRACE_FL_TRAMP
      FTRACE_FL_TRAMP_EN
    
    The first is set if the count for the record is one, and the ftrace_ops
    associated to that record has its own trampoline. That way the mcount code
    can call that trampoline directly.
    
    In the future, trampolines can be added to arbitrary ftrace_ops, where you
    can have two or more ftrace_ops registered to ftrace (like kprobes and perf)
    and if they are not tracing the same functions, then instead of doing a
    loop to check all registered ftrace_ops against their hashes, just call the
    ftrace_ops trampoline directly, which would call the registered ftrace_ops
    function directly.
    
    Without this patch perf showed:
    
      0.05%  hackbench  [kernel.kallsyms]  [k] ftrace_caller
      0.05%  hackbench  [kernel.kallsyms]  [k] arch_local_irq_save
      0.05%  hackbench  [kernel.kallsyms]  [k] native_sched_clock
      0.04%  hackbench  [kernel.kallsyms]  [k] __buffer_unlock_commit
      0.04%  hackbench  [kernel.kallsyms]  [k] preempt_trace
      0.04%  hackbench  [kernel.kallsyms]  [k] prepare_ftrace_return
      0.04%  hackbench  [kernel.kallsyms]  [k] __this_cpu_preempt_check
      0.04%  hackbench  [kernel.kallsyms]  [k] ftrace_graph_caller
    
    See that the ftrace_caller took up more time than the ftrace_graph_caller
    did.
    
    With this patch:
    
      0.05%  hackbench  [kernel.kallsyms]  [k] __buffer_unlock_commit
      0.04%  hackbench  [kernel.kallsyms]  [k] call_filter_check_discard
      0.04%  hackbench  [kernel.kallsyms]  [k] ftrace_graph_caller
      0.04%  hackbench  [kernel.kallsyms]  [k] sched_clock
    
    The ftrace_caller is no where to be found and ftrace_graph_caller still
    takes up the same percentage.
    
    Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
    79922b80