1. 05 Mar, 2009 1 commit
  2. 20 Feb, 2009 2 commits
    • Steven Rostedt's avatar
      ftrace: immediately stop code modification if failure is detected · 90c7ac49
      Steven Rostedt authored
      
      
      Impact: fix to prevent NMI lockup
      
      If the page fault handler produces a WARN_ON in the modifying of
      text, and the system is setup to have a high frequency of NMIs,
      we can lock up the system on a failure to modify code.
      
      The modifying of code with NMIs allows all NMIs to modify the code
      if it is about to run. This prevents a modifier on one CPU from
      modifying code running in NMI context on another CPU. The modifying
      is done through stop_machine, so only NMIs must be considered.
      
      But if the write causes the page fault handler to produce a warning,
      the print can slow it down enough that as soon as it is done
      it will take another NMI before going back to the process context.
      The new NMI will perform the write again causing another print and
      this will hang the box.
      
      This patch turns off the writing as soon as a failure is detected
      and does not wait for it to be turned off by the process context.
      This will keep NMIs from getting stuck in this back and forth
      of print outs.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      90c7ac49
    • Steven Rostedt's avatar
      ftrace, x86: make kernel text writable only for conversions · 16239630
      Steven Rostedt authored
      
      
      Impact: keep kernel text read only
      
      Because dynamic ftrace converts the calls to mcount into and out of
      nops at run time, we needed to always keep the kernel text writable.
      
      But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
      the kernel code to writable before ftrace modifies the text, and converts
      it back to read only afterward.
      
      The kernel text is converted to read/write, stop_machine is called to
      modify the code, then the kernel text is converted back to read only.
      
      The original version used SYSTEM_STATE to determine when it was OK
      or not to change the code to rw or ro. Andrew Morton pointed out that
      using SYSTEM_STATE is a bad idea since there is no guarantee to what
      its state will actually be.
      
      Instead, I moved the check into the set_kernel_text_* functions
      themselves, and use a local variable to determine when it is
      OK to change the kernel text RW permissions.
      
      [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      16239630
  3. 18 Feb, 2009 1 commit
  4. 11 Feb, 2009 1 commit
  5. 10 Feb, 2009 2 commits
    • Steven Rostedt's avatar
      tracing, x86: fix fixup section to return to original code · e3944bfa
      Steven Rostedt authored
      
      
      Impact: fix to prevent a kernel crash on fault
      
      If for some reason the pointer to the parent function on the
      stack takes a fault, the fix up code will not return back to
      the original faulting code. This can lead to unpredictable
      results and perhaps even a kernel panic.
      
      A fault should not happen, but if it does, we should simply
      disable the tracer, warn, and continue running the kernel.
      It should not lead to a kernel crash.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      e3944bfa
    • Steven Rostedt's avatar
      tracing, x86: fix constraint for parent variable · 96665788
      Steven Rostedt authored
      
      
      The constraint used for retrieving and restoring the parent function
      pointer is incorrect. The parent variable is a pointer, and the
      address of the pointer is modified by the asm statement and not
      the pointer itself. It is incorrect to pass it in as an output
      constraint since the asm will never update the pointer.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      96665788
  6. 09 Feb, 2009 1 commit
  7. 08 Feb, 2009 4 commits
    • Steven Rostedt's avatar
      ring-buffer: use generic version of in_nmi · a81bd80a
      Steven Rostedt authored
      
      
      Impact: clean up
      
      Now that a generic in_nmi is available, this patch removes the
      special code in the ring_buffer and implements the in_nmi generic
      version instead.
      
      With this change, I was also able to rename the "arch_ftrace_nmi_enter"
      back to "ftrace_nmi_enter" and remove the code from the ring buffer.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      a81bd80a
    • Steven Rostedt's avatar
      ftrace: change function graph tracer to use new in_nmi · 9a5fd902
      Steven Rostedt authored
      
      
      The function graph tracer piggy backed onto the dynamic ftracer
      to use the in_nmi custom code for dynamic tracing. The problem
      was (as Andrew Morton pointed out) it really only wanted to bail
      out if the context of the current CPU was in NMI context. But the
      dynamic ftrace in_nmi custom code was true if _any_ CPU happened
      to be in NMI context.
      
      Now that we have a generic in_nmi interface, this patch changes
      the function graph code to use it instead of the dynamic ftarce
      custom code.
      Reported-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      9a5fd902
    • Steven Rostedt's avatar
      ftrace, x86: rename in_nmi variable · 4e6ea144
      Steven Rostedt authored
      
      
      Impact: clean up
      
      The in_nmi variable in x86 arch ftrace.c is a misnomer.
      Andrew Morton pointed out that the in_nmi variable is incremented
      by all CPUS. It can be set when another CPU is running an NMI.
      
      Since this is actually intentional, the fix is to rename it to
      what it really is: "nmi_running"
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      4e6ea144
    • Steven Rostedt's avatar
      ring-buffer: add NMI protection for spinlocks · 78d904b4
      Steven Rostedt authored
      
      
      Impact: prevent deadlock in NMI
      
      The ring buffers are not yet totally lockless with writing to
      the buffer. When a writer crosses a page, it grabs a per cpu spinlock
      to protect against a reader. The spinlocks taken by a writer are not
      to protect against other writers, since a writer can only write to
      its own per cpu buffer. The spinlocks protect against readers that
      can touch any cpu buffer. The writers are made to be reentrant
      with the spinlocks disabling interrupts.
      
      The problem arises when an NMI writes to the buffer, and that write
      crosses a page boundary. If it grabs a spinlock, it can be racing
      with another writer (since disabling interrupts does not protect
      against NMIs) or with a reader on the same CPU. Luckily, most of the
      users are not reentrant and protects against this issue. But if a
      user of the ring buffer becomes reentrant (which is what the ring
      buffers do allow), if the NMI also writes to the ring buffer then
      we risk the chance of a deadlock.
      
      This patch moves the ftrace_nmi_enter called by nmi_enter() to the
      ring buffer code. It replaces the current ftrace_nmi_enter that is
      used by arch specific code to arch_ftrace_nmi_enter and updates
      the Kconfig to handle it.
      
      When an NMI is called, it will set a per cpu variable in the ring buffer
      code and will clear it when the NMI exits. If a write to the ring buffer
      crosses page boundaries inside an NMI, a trylock is used on the spin
      lock instead. If the spinlock fails to be acquired, then the entry
      is discarded.
      
      This bug appeared in the ftrace work in the RT tree, where event tracing
      is reentrant. This workaround solved the deadlocks that appeared there.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      78d904b4
  8. 27 Jan, 2009 1 commit
  9. 08 Dec, 2008 1 commit
  10. 03 Dec, 2008 4 commits
  11. 02 Dec, 2008 1 commit
    • Frederic Weisbecker's avatar
      tracing/function-graph-tracer: support for x86-64 · 48d68b20
      Frederic Weisbecker authored
      
      
      Impact: extend and enable the function graph tracer to 64-bit x86
      
      This patch implements the support for function graph tracer under x86-64.
      Both static and dynamic tracing are supported.
      
      This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
      wanted to use probe_kernel_read/write to make the return address
      saving/patching code more generic but it causes tracing recursion.
      
      That would be perhaps useful to implement a notrace version of these
      function for other archs ports.
      
      Note that arch/x86/process_64.c is not traced, as in X86-32. I first
      thought __switch_to() was responsible of crashes during tracing because I
      believed current task were changed inside but that's actually not the
      case (actually yes, but not the "current" pointer).
      
      So I will have to investigate to find the functions that harm here, to
      enable tracing of the other functions inside (but there is no issue at
      this time, while process_64.c stays out of -pg flags).
      
      A little possible race condition is fixed inside this patch too. When the
      tracer allocate a return stack dynamically, the current depth is not
      initialized before but after. An interrupt could occur at this time and,
      after seeing that the return stack is allocated, the tracer could try to
      trace it with a random uninitialized depth. It's a prevention, even if I
      hadn't problems with it.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Bird <tim.bird@am.sony.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      48d68b20
  12. 26 Nov, 2008 3 commits
    • Steven Rostedt's avatar
      ftrace: use code patching for ftrace graph tracer · 5a45cfe1
      Steven Rostedt authored
      
      
      Impact: more efficient code for ftrace graph tracer
      
      This patch uses the dynamic patching, when available, to patch
      the function graph code into the kernel.
      
      This patch will ease the way for letting both function tracing
      and function graph tracing run together.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5a45cfe1
    • Frederic Weisbecker's avatar
      tracing/function-return-tracer: set a more human readable output · 287b6e68
      Frederic Weisbecker authored
      
      
      Impact: feature
      
      This patch sets a C-like output for the function graph tracing.
      For this aim, we now call two handler for each function: one on the entry
      and one other on return. This way we can draw a well-ordered call stack.
      
      The pid of the previous trace is loosely stored to be compared against
      the one of the current trace to see if there were a context switch.
      
      Without this little feature, the call tree would seem broken at
      some locations.
      We could use the sched_tracer to capture these sched_events but this
      way of processing is much more simpler.
      
      2 spaces have been chosen for indentation to fit the screen while deep
      calls. The time of execution in nanosecs is printed just after closed
      braces, it seems more easy this way to find the corresponding function.
      If the time was printed as a first column, it would be not so easy to
      find the corresponding function if it is called on a deep depth.
      
      I plan to output the return value but on 32 bits CPU, the return value
      can be 32 or 64, and its difficult to guess on which case we are.
      I don't know what would be the better solution on X86-32: only print
      eax (low-part) or even edx (high-part).
      
      Actually it's thee same problem when a function return a 8 bits value, the
      high part of eax could contain junk values...
      
      Here is an example of trace:
      
      sys_read() {
        fget_light() {
        } 526
        vfs_read() {
          rw_verify_area() {
            security_file_permission() {
              cap_file_permission() {
              } 519
            } 1564
          } 2640
          do_sync_read() {
            pipe_read() {
              __might_sleep() {
              } 511
              pipe_wait() {
                prepare_to_wait() {
                } 760
                deactivate_task() {
                  dequeue_task() {
                    dequeue_task_fair() {
                      dequeue_entity() {
                        update_curr() {
                          update_min_vruntime() {
                          } 504
                        } 1587
                        clear_buddies() {
                        } 512
                        add_cfs_task_weight() {
                        } 519
                        update_min_vruntime() {
                        } 511
                      } 5602
                      dequeue_entity() {
                        update_curr() {
                          update_min_vruntime() {
                          } 496
                        } 1631
                        clear_buddies() {
                        } 496
                        update_min_vruntime() {
                        } 527
                      } 4580
                      hrtick_update() {
                        hrtick_start_fair() {
                        } 488
                      } 1489
                    } 13700
                  } 14949
                } 16016
                msecs_to_jiffies() {
                } 496
                put_prev_task_fair() {
                } 504
                pick_next_task_fair() {
                } 489
                pick_next_task_rt() {
                } 496
                pick_next_task_fair() {
                } 489
                pick_next_task_idle() {
                } 489
      
      ------------8<---------- thread 4 ------------8<----------
      
      finish_task_switch() {
      } 1203
      do_softirq() {
        __do_softirq() {
          __local_bh_disable() {
          } 669
          rcu_process_callbacks() {
            __rcu_process_callbacks() {
              cpu_quiet() {
                rcu_start_batch() {
                } 503
              } 1647
            } 3128
            __rcu_process_callbacks() {
            } 542
          } 5362
          _local_bh_enable() {
          } 587
        } 8880
      } 9986
      kthread_should_stop() {
      } 669
      deactivate_task() {
        dequeue_task() {
          dequeue_task_fair() {
            dequeue_entity() {
              update_curr() {
                calc_delta_mine() {
                } 511
                update_min_vruntime() {
                } 511
              } 2813
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      287b6e68
    • Frederic Weisbecker's avatar
      tracing/function-return-tracer: change the name into function-graph-tracer · fb52607a
      Frederic Weisbecker authored
      
      
      Impact: cleanup
      
      This patch changes the name of the "return function tracer" into
      function-graph-tracer which is a more suitable name for a tracing
      which makes one able to retrieve the ordered call stack during
      the code flow.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      fb52607a
  13. 23 Nov, 2008 1 commit
  14. 18 Nov, 2008 1 commit
    • Frederic Weisbecker's avatar
      tracing/function-return-tracer: add the overrun field · 0231022c
      Frederic Weisbecker authored
      
      
      Impact: help to find the better depth of trace
      
      We decided to arbitrary define the depth of function return trace as
      "20". Perhaps this is not enough. To help finding an optimal depth, we
      measure now the overrun: the number of functions that have been missed
      for the current thread. By default this is not displayed, we have to
      do set a particular flag on the return tracer: echo overrun >
      /debug/tracing/trace_options And the overrun will be printed on the
      right.
      
      As the trace shows below, the current 20 depth is not enough.
      
      update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
      update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
      do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
      tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
      tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
      vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
      vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
      set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
      con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
      release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
      release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
      con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
      con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
      n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
      smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
      irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
      smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
      ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
      ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
      ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
      hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0231022c
  15. 16 Nov, 2008 3 commits
    • Frederic Weisbecker's avatar
      tracing/function-return-tracer: support for dynamic ftrace on function return tracer · e7d3737e
      Frederic Weisbecker authored
      
      
      This patch adds the support for dynamic tracing on the function return tracer.
      The whole difference with normal dynamic function tracing is that we don't need
      to hook on a particular callback. The only pro that we want is to nop or set
      dynamically the calls to ftrace_caller (which is ftrace_return_caller here).
      
      Some security checks ensure that we are not trying to launch dynamic tracing for
      return tracing while normal function tracing is already running.
      
      An example of trace with getnstimeofday set as a filter:
      
      ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e7d3737e
    • Frederic Weisbecker's avatar
      tracing/function-return-tracer: add a barrier to ensure return stack index is incremented in memory · b01c7466
      Frederic Weisbecker authored
      
      
      Impact: fix possible race condition in ftrace function return tracer
      
      This fixes a possible race condition if index incrementation
      is not immediately flushed in memory.
      
      Thanks for Andi Kleen and Steven Rostedt for pointing out this issue
      and give me this solution.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b01c7466
    • Steven Rostedt's avatar
      ftrace: pass module struct to arch dynamic ftrace functions · 31e88909
      Steven Rostedt authored
      
      
      Impact: allow archs more flexibility on dynamic ftrace implementations
      
      Dynamic ftrace has largly been developed on x86. Since x86 does not
      have the same limitations as other architectures, the ftrace interaction
      between the generic code and the architecture specific code was not
      flexible enough to handle some of the issues that other architectures
      have.
      
      Most notably, module trampolines. Due to the limited branch distance
      that archs make in calling kernel core code from modules, the module
      load code must create a trampoline to jump to what will make the
      larger jump into core kernel code.
      
      The problem arises when this happens to a call to mcount. Ftrace checks
      all code before modifying it and makes sure the current code is what
      it expects. Right now, there is not enough information to handle modifying
      module trampolines.
      
      This patch changes the API between generic dynamic ftrace code and
      the arch dependent code. There is now two functions for modifying code:
      
        ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
             a nop, where the original text is calling addr. (mod is the
             module struct if called by module init)
      
        ftrace_make_caller(rec, addr) - convert the code rec->ip that should
             be a nop into a caller to addr.
      
      The record "rec" now has a new field called "arch" where the architecture
      can add any special attributes to each call site record.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      31e88909
  16. 12 Nov, 2008 2 commits
  17. 11 Nov, 2008 3 commits
    • Ingo Molnar's avatar
      tracing: function return tracer, build fix · 19b3e967
      Ingo Molnar authored
      
      
      fix:
      
       arch/x86/kernel/ftrace.c: In function 'ftrace_return_to_handler':
       arch/x86/kernel/ftrace.c:112: error: implicit declaration of function 'cpu_clock'
      
      cpu_clock() is implicitly included via a number of ways, but its real
      location is sched.h. (Build failure is triggerable if enough other
      kernel components are turned off.)
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      19b3e967
    • Ingo Molnar's avatar
      tracing, x86: function return tracer, fix assembly constraints · 867f7fb3
      Ingo Molnar authored
      
      
      fix:
      
       arch/x86/kernel/ftrace.c: Assembler messages:
       arch/x86/kernel/ftrace.c:140: Error: missing ')'
       arch/x86/kernel/ftrace.c:140: Error: junk `(%ebp))' after expression
       arch/x86/kernel/ftrace.c:141: Error: missing ')'
       arch/x86/kernel/ftrace.c:141: Error: junk `(%ebp))' after expression
      
      the [parent_replaced] is used in an =rm fashion, so that constraint
      is correct in isolation - but [parent_old] aliases register %0 and uses
      it in an addressing mode that is only valid with registers - so change
      the constraint from =rm to =r.
      
      This fixes the build failure.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      867f7fb3
    • Frederic Weisbecker's avatar
      tracing, x86: add low level support for ftrace return tracing · caf4b323
      Frederic Weisbecker authored
      
      
      Impact: add infrastructure for function-return tracing
      
      Add low level support for ftrace return tracing.
      
      This plug-in stores return addresses on the thread_info structure of
      the current task.
      
      The index of the current return address is initialized when the task
      is the first one (init) and when a process forks (the child). It is
      not needed when a task does a sys_execve because after this syscall,
      it still needs to return on the kernel functions it called.
      
      Note that the code of return_to_handler has been suggested by Steven
      Rostedt as almost all of the ideas of improvements in this V3.
      
      For purpose of security, arch/x86/kernel/process_32.c is not traced
      because __switch_to() changes the current task during its execution.
      That could cause inconsistency in the stored return address of this
      function even if I didn't have any crash after testing with tracing on
      this function enabled.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      caf4b323
  18. 31 Oct, 2008 1 commit
  19. 30 Oct, 2008 2 commits
    • Steven Rostedt's avatar
      ftrace: nmi update statistics · b807c3d0
      Steven Rostedt authored
      
      
      Impact: add more debug info to /debugfs/tracing/dyn_ftrace_total_info
      
      This patch adds dynamic ftrace NMI update statistics to the
      /debugfs/tracing/dyn_ftrace_total_info stat file.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b807c3d0
    • Steven Rostedt's avatar
      ftrace: nmi safe code modification · 17666f02
      Steven Rostedt authored
      
      
      Impact: fix crashes that can occur in NMI handlers, if their code is modified
      
      Modifying code is something that needs special care. On SMP boxes,
      if code that is being modified is also being executed on another CPU,
      that CPU will have undefined results.
      
      The dynamic ftrace uses kstop_machine to make the system act like a
      uniprocessor system. But this does not address NMIs, that can still
      run on other CPUs.
      
      One approach to handle this is to make all code that are used by NMIs
      not be traced. But NMIs can call notifiers that spread throughout the
      kernel and this will be very hard to maintain, and the chance of missing
      a function is very high.
      
      The approach that this patch takes is to have the NMIs modify the code
      if the modification is taking place. The way this works is that just
      writing to code executing on another CPU is not harmful if what is
      written is the same as what exists.
      
      Two buffers are used: an IP buffer and a "code" buffer.
      
      The steps that the patcher takes are:
      
       1) Put in the instruction pointer into the IP buffer
          and the new code into the "code" buffer.
       2) Set a flag that says we are modifying code
       3) Wait for any running NMIs to finish.
       4) Write the code
       5) clear the flag.
       6) Wait for any running NMIs to finish.
      
      If an NMI is executed, it will also write the pending code.
      Multiple writes are OK, because what is being written is the same.
      Then the patcher must wait for all running NMIs to finish before
      going to the next line that must be patched.
      
      This is basically the RCU approach to code modification.
      
      Thanks to Ingo Molnar for suggesting the idea, and to Arjan van de Ven
      for his guidence on what is safe and what is not.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      17666f02
  20. 27 Oct, 2008 1 commit
    • Steven Rostedt's avatar
      ftrace: use a real variable for ftrace_nop in x86 · 8115f3f0
      Steven Rostedt authored
      
      
      Impact: avoid section mismatch warning, clean up
      
      The dynamic ftrace determines which nop is safe to use at start up.
      When it finds a safe nop for patching, it sets a pointer called ftrace_nop
      to point to the code. All call sites are then patched to this nop.
      
      Later, when tracing is turned on, this ftrace_nop variable is again used
      to compare the location to make sure it is a nop before we update it to
      an mcount call. If this fails just once, a warning is printed and ftrace
      is disabled.
      
      Rakib Mullick noted that the code that sets up the nop is a .init section
      where as the nop itself is in the .text section. This is needed because
      the nop is used later on after boot up. The problem is that the test of the
      nop jumps back to the setup code and causes a "section mismatch" warning.
      
      Rakib first recommended to convert the nop to .init.text, but as stated
      above, this would fail since that text is used later.
      
      The real solution is to extend Rabik's patch, and to make the ftrace_nop
      into an array, and just save the code from the assembly to this array.
      
      Now the section can stay as an init section, and we have a nop to use
      later on.
      Reported-by: default avatarRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8115f3f0
  21. 23 Oct, 2008 4 commits