1. 06 Feb, 2009 3 commits
    • Arnaldo Carvalho de Melo's avatar
      trace: Call tracing_reset_online_cpus before tracer->init() · b6f11df2
      Arnaldo Carvalho de Melo authored
      
      
      Impact: cleanup
      
      To make it easy for ftrace plugin writers, as this was open coded in
      the existing plugins
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b6f11df2
    • Arnaldo Carvalho de Melo's avatar
      tracing: Introduce trace_buffer_{lock_reserve,unlock_commit} · 51a763dd
      Arnaldo Carvalho de Melo authored
      
      
      Impact: new API
      
      These new functions do what previously was being open coded, reducing
      the number of details ftrace plugin writers have to worry about.
      
      It also standardizes the handling of stacktrace, userstacktrace and
      other trace options we may introduce in the future.
      
      With this patch, for instance, the blk tracer (and some others already
      in the tree) can use the "userstacktrace" /d/tracing/trace_options
      facility.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |   -5
        trace_graph_return         |  -22
        trace_graph_entry          |  -26
        trace_function             |  -45
        __ftrace_trace_stack       |  -27
        ftrace_trace_userstack     |  -29
        tracing_sched_switch_trace |  -66
        tracing_stop               |   +1
        trace_seq_to_user          |   -1
        ftrace_trace_special       |  -63
        ftrace_special             |   +1
        tracing_sched_wakeup_trace |  -70
        tracing_reset_online_cpus  |   -1
       13 functions changed, 2 bytes added, 355 bytes removed, diff: -353
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |  -58
       1 function changed, 58 bytes removed, diff: -58
      
      linux-2.6-tip/kernel/trace/trace.c:
        trace_buffer_lock_reserve  |  +88
        trace_buffer_unlock_commit |  +86
       2 functions changed, 174 bytes added, diff: +174
      
      /tmp/vmlinux.after:
       16 functions changed, 176 bytes added, 413 bytes removed, diff: -237
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      51a763dd
    • Arnaldo Carvalho de Melo's avatar
      ring_buffer: remove unused flags parameter · 0a987751
      Arnaldo Carvalho de Melo authored
      
      
      Impact: API change, cleanup
      
      >From ring_buffer_{lock_reserve,unlock_commit}.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |  -14
        trace_graph_return         |  -14
        trace_graph_entry          |  -10
        trace_function             |   -8
        __ftrace_trace_stack       |   -8
        ftrace_trace_userstack     |   -8
        tracing_sched_switch_trace |   -8
        ftrace_trace_special       |  -12
        tracing_sched_wakeup_trace |   -8
       9 functions changed, 90 bytes removed, diff: -90
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |   -1
       1 function changed, 1 bytes removed, diff: -1
      
      /tmp/vmlinux.after:
       10 functions changed, 91 bytes removed, diff: -91
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: default avatarFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      0a987751
  2. 05 Feb, 2009 4 commits
  3. 04 Feb, 2009 2 commits
  4. 03 Feb, 2009 6 commits
    • Oleg Nesterov's avatar
      ftrace: do_each_pid_task() needs rcu lock · 229c4ef8
      Oleg Nesterov authored
      "ftrace: use struct pid" commit 978f3a45
      
      
      converted ftrace_pid_trace to "struct pid*".
      
      But we can't use do_each_pid_task() without rcu_read_lock() even if
      we know the pid itself can't go away (it was pinned in ftrace_pid_write).
      The exiting task can detach itself from this pid at any moment.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      229c4ef8
    • Arnaldo Carvalho de Melo's avatar
      trace: Change struct trace_event callbacks parameter list · 2c9b238e
      Arnaldo Carvalho de Melo authored
      
      
      Impact: API change
      
      The trace_seq and trace_entry are in trace_iterator, where there are
      more fields that may be needed by tracers, so just pass the
      tracer_iterator as is already the case for struct tracer->print_line.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2c9b238e
    • Frederic Weisbecker's avatar
      trace: better manage the context info for events · c4a8e8be
      Frederic Weisbecker authored
      
      
      Impact: make trace_event more convenient for tracers
      
      All tracers (for the moment) that use the struct trace_event want to
      have the context info printed before their own output: the pid/cmdline,
      cpu, and timestamp.
      
      But some other tracers that want to implement their trace_event
      callbacks will not necessary need these information or they may want to
      format them as they want.
      
      This patch adds a new default-enabled trace option:
      TRACE_ITER_CONTEXT_INFO When disabled through:
      
      echo nocontext-info > /debugfs/tracing/trace_options
      
      The pid, cpu and timestamps headers will not be printed.
      
      IE with the sched_switch tracer with context-info (default):
      
           bash-2935 [001] 100.356561: 2935:120:S ==> [001]  0:140:R <idle>
         <idle>-0    [000] 100.412804:    0:140:R   + [000] 11:115:S events/0
         <idle>-0    [000] 100.412816:    0:140:R ==> [000] 11:115:R events/0
       events/0-11   [000] 100.412829:   11:115:S ==> [000]  0:140:R <idle>
      
      Without context-info:
      
       2935:120:S ==> [001]  0:140:R <idle>
          0:140:R   + [000] 11:115:S events/0
          0:140:R ==> [000] 11:115:R events/0
         11:115:S ==> [000]  0:140:R <idle>
      
      A tracer can disable it at runtime by clearing the bit
      TRACE_ITER_CONTEXT_INFO in trace_flags.
      
      The print routines were renamed to trace_print_context and
      trace_print_lat_context, so that they can be used by tracers if they
      want to use them for one of the trace_event callbacks.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c4a8e8be
    • Steven Rostedt's avatar
      trace: let boot trace be chosen by command line · 79fb0768
      Steven Rostedt authored
      
      
      Now that we have a working ftrace=<tracer> function, make the boot
      tracer get activated by it. This way we can turn it on or off without
      recompiling the kernel, as well as keeping the selftests on. The
      selftests are disabled whenever a default tracer starts running.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      79fb0768
    • Steven Rostedt's avatar
      trace: fix default boot up tracer · b2821ae6
      Steven Rostedt authored
      
      
      Peter Zijlstra started the functionality to start up a default
      tracing at bootup. This patch finishes the work.
      
      Now if you add 'ftrace=<tracer>' to the command line, when that tracer
      is registered on bootup, that tracer is selected and starts tracing.
      
      Note, all selftests for tracers that are registered after this tracer
      is disabled. This prevents the selftests from disturbing the running
      tracer, or the running tracer from disturbing the selftest.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b2821ae6
    • Eric Dumazet's avatar
      modules: Use a better scheme for refcounting · 720eba31
      Eric Dumazet authored
      
      
      Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is
      using a lot of memory.
      
      Each 'struct module' contains an [NR_CPUS] array of full cache lines.
      
      This patch uses existing infrastructure (percpu_modalloc() &
      percpu_modfree()) to allocate percpu space for the refcount storage.
      
      Instead of wasting NR_CPUS*128 bytes (on i386), we now use
      nr_cpu_ids*sizeof(local_t) bytes.
      
      On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce
      size of module files by about 2 Mbytes. (1Kb per module)
      
      Instead of having all refcounters in the same memory node - with TLB misses
      because of vmalloc() - this new implementation permits to have better
      NUMA properties, since each  CPU will use storage on its preferred node,
      thanks to percpu storage.
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      720eba31
  5. 01 Feb, 2009 5 commits
  6. 30 Jan, 2009 9 commits
  7. 29 Jan, 2009 2 commits
  8. 28 Jan, 2009 2 commits
  9. 27 Jan, 2009 1 commit
    • Rafael J. Wysocki's avatar
      Hibernation: Introduce system_entering_hibernation · abfe2d7b
      Rafael J. Wysocki authored
      
      
      Introduce boolean function system_entering_hibernation() returning
      'true' during the last phase of hibernation, in which devices are
      being put into low power states and the sleep state (for example,
      ACPI S4) is finally entered.
      
      Some device drivers need such a function to check if the system is
      in the final phase of hibernation.  In particular, some SATA drivers
      are going to use it for blacklisting systems in which the disks
      should not be spun down during the last phase of hibernation (the
      BIOS will do that anyway).
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: default avatarJeff Garzik <jgarzik@redhat.com>
      abfe2d7b
  10. 26 Jan, 2009 4 commits
    • Ed Swierk's avatar
      signals, debug: fix BUG: using smp_processor_id() in preemptible code in print_fatal_signal() · 3a9f84d3
      Ed Swierk authored
      
      
      With print-fatal-signals=1 on a kernel with CONFIG_PREEMPT=y, sending an
      unexpected signal to a process causes a BUG: using smp_processor_id() in
      preemptible code.
      
      get_signal_to_deliver() releases the siglock before calling
      print_fatal_signal(), which calls show_regs(), which calls
      smp_processor_id(), which is not supposed to be called from a
      preemptible thread.
      
      Make sure show_regs() runs with preemption disabled.
      Signed-off-by: default avatarEd Swierk <eswierk@aristanetworks.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3a9f84d3
    • Arnaldo Carvalho de Melo's avatar
      blktrace: add ftrace plugin · c71a8961
      Arnaldo Carvalho de Melo authored
      
      
      Impact: New way of using the blktrace infrastructure
      
      This drops the requirement of userspace utilities to use the blktrace
      facility.
      
      Configuration is done thru sysfs, adding a "trace" directory to the
      partition directory where blktrace can be enabled for the associated
      request_queue.
      
      The same filters present in the IOCTL interface are present as sysfs
      device attributes.
      
      The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
      filters.
      
      The other files in this directory: pid, act_mask, start_lba and end_lba
      can be used with the same meaning as with the IOCTL interface.
      
      Using the sysfs interface will only setup the request_queue->blk_trace
      fields, tracing will only take place when the "blk" tracer is selected
      via the ftrace interface, as in the following example:
      
      To see the trace, one can use the /d/tracing/trace file or the
      /d/tracign/trace_pipe file, with semantics defined in the ftrace
      documentation in Documentation/ftrace.txt.
      
      [root@f10-1 ~]# cat /t/trace
             kjournald-305   [000]  3046.491224:   8,1    A WBS 6367 + 8 <- (8,1) 6304
             kjournald-305   [000]  3046.491227:   8,1    Q   R 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491236:   8,1    G  RB 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491239:   8,1    P  NS [kjournald]
             kjournald-305   [000]  3046.491242:   8,1    I RBS 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491251:   8,1    D  WB 6367 + 8 [kjournald]
             kjournald-305   [000]  3046.491610:   8,1    U  WS [kjournald] 1
                <idle>-0     [000]  3046.511914:   8,1    C  RS 6367 + 8 [6367]
      [root@f10-1 ~]#
      
      The default line context (prefix) format is the one described in the ftrace
      documentation, with the blktrace specific bits using its existing format,
      described in blkparse(8).
      
      If one wants to have the classic blktrace formatting, this is possible by
      using:
      
      [root@f10-1 ~]# echo blk_classic > /t/trace_options
      [root@f10-1 ~]# cat /t/trace
        8,1    0  3046.491224   305  A WBS 6367 + 8 <- (8,1) 6304
        8,1    0  3046.491227   305  Q   R 6367 + 8 [kjournald]
        8,1    0  3046.491236   305  G  RB 6367 + 8 [kjournald]
        8,1    0  3046.491239   305  P  NS [kjournald]
        8,1    0  3046.491242   305  I RBS 6367 + 8 [kjournald]
        8,1    0  3046.491251   305  D  WB 6367 + 8 [kjournald]
        8,1    0  3046.491610   305  U  WS [kjournald] 1
        8,1    0  3046.511914     0  C  RS 6367 + 8 [6367]
      [root@f10-1 ~]#
      
      Using the ftrace standard format allows more flexibility, such
      as the ability of asking for backtraces via trace_options:
      
      [root@f10-1 ~]# echo noblk_classic > /t/trace_options
      [root@f10-1 ~]# echo stacktrace > /t/trace_options
      
      [root@f10-1 ~]# cat /t/trace
             kjournald-305   [000]  3318.826779:   8,1    A WBS 6375 + 8 <- (8,1) 6312
             kjournald-305   [000]  3318.826782:
       <= submit_bio
       <= submit_bh
       <= sync_dirty_buffer
       <= journal_commit_transaction
       <= kjournald
       <= kthread
       <= child_rip
             kjournald-305   [000]  3318.826836:   8,1    Q   R 6375 + 8 [kjournald]
             kjournald-305   [000]  3318.826837:
       <= generic_make_request
       <= submit_bio
       <= submit_bh
       <= sync_dirty_buffer
       <= journal_commit_transaction
       <= kjournald
       <= kthread
      
      Please read the ftrace documentation to use aditional, standardized
      tracing filters such as /d/tracing/trace_cpumask, etc.
      
      See also /d/tracing/trace_mark to add comments in the trace stream,
      that is equivalent to the /d/block/sdaN/msg interface.
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c71a8961
    • Arnaldo Carvalho de Melo's avatar
      ftrace: add ftrace_vprintk · 9011262a
      Arnaldo Carvalho de Melo authored
      
      
      Impact: new helper function
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9011262a
    • Randy Dunlap's avatar
      kmemtrace: fix printk format warnings · cc2f6d90
      Randy Dunlap authored
      
      
      Fix kmemtrace printk warnings:
      
        kernel/trace/kmemtrace.c:142: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
        kernel/trace/kmemtrace.c:147: warning: format '%4ld' expects type 'long int', but argument 3 has type 'size_t'
      Signed-off-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: default avatarEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cc2f6d90
  11. 23 Jan, 2009 2 commits
    • Frederic Weisbecker's avatar
      tracing/function-graph-tracer: various fixes and features · 9005f3eb
      Frederic Weisbecker authored
      
      
      This patch brings various bugfixes:
      
      - Drop the first irrelevant task switch on the very beginning of a trace.
      
      - Drop the OVERHEAD word from the headers, the DURATION word is sufficient
        and will not overlap other columns.
      
      - Make the headers fit well their respective columns whatever the
        selected options.
      
      Ie, default options:
      
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
      
        1)   0.646 us    |                    }
        1)               |                    mem_cgroup_del_lru_list() {
        1)   0.624 us    |                      lookup_page_cgroup();
        1)   1.970 us    |                    }
      
       echo funcgraph-proc > trace_options
      
       # tracer: function_graph
       #
       # CPU  TASK/PID        DURATION                  FUNCTION CALLS
       # |    |    |           |   |                     |   |   |   |
      
        0)   bash-2937    |   0.895 us    |                }
        0)   bash-2937    |   0.888 us    |                __rcu_read_unlock();
        0)   bash-2937    |   0.864 us    |                conv_uni_to_pc();
        0)   bash-2937    |   1.015 us    |                __rcu_read_lock();
      
       echo nofuncgraph-cpu > trace_options
       echo nofuncgraph-proc > trace_options
      
       # tracer: function_graph
       #
       #   DURATION                  FUNCTION CALLS
       #    |   |                     |   |   |   |
      
         3.752 us    |                  native_pud_val();
         0.616 us    |                  native_pud_val();
         0.624 us    |                  native_pmd_val();
      
      About features, one can now disable the duration (this will hide the
      overhead too for convenient reasons and because on  doesn't need
      overhead if it hasn't the duration):
      
       echo nofuncgraph-duration > trace_options
      
       # tracer: function_graph
       #
       #                FUNCTION CALLS
       #                |   |   |   |
      
                 cap_vm_enough_memory() {
                   __vm_enough_memory() {
                     vm_acct_memory();
                   }
                 }
               }
      
      And at last, an option to print the absolute time:
      
       //Restart from default options
       echo funcgraph-abstime > trace_options
      
       # tracer: function_graph
       #
       #      TIME       CPU  DURATION                  FUNCTION CALLS
       #       |         |     |   |                     |   |   |   |
      
         261.339774 |   1) + 42.823 us   |    }
         261.339775 |   1)   1.045 us    |    _spin_lock_irq();
         261.339777 |   1)   0.940 us    |    _spin_lock_irqsave();
         261.339778 |   1)   0.752 us    |    _spin_unlock_irqrestore();
         261.339780 |   1)   0.857 us    |    _spin_unlock_irq();
         261.339782 |   1)               |    flush_to_ldisc() {
         261.339783 |   1)               |      tty_ldisc_ref() {
         261.339783 |   1)               |        tty_ldisc_try() {
         261.339784 |   1)   1.075 us    |          _spin_lock_irqsave();
         261.339786 |   1)   0.842 us    |          _spin_unlock_irqrestore();
         261.339788 |   1)   4.211 us    |        }
         261.339788 |   1)   5.662 us    |      }
      
      The format is seconds.usecs.
      
      I guess no one needs the nanosec precision here, the main goal is to have
      an overview about the general timings of events, and to see the place when
      the trace switches from one cpu to another.
      
      ie:
      
         274.874760 |   1)   0.676 us    |      _spin_unlock();
         274.874762 |   1)   0.609 us    |      native_load_sp0();
         274.874763 |   1)   0.602 us    |      native_load_tls();
         274.878739 |   0)   0.722 us    |                  }
         274.878740 |   0)   0.714 us    |                  native_pmd_val();
         274.878741 |   0)   0.730 us    |                  native_pmd_val();
      
      Here there is a 4000 usecs difference when we switch the cpu.
      
      Changes in V2:
      
      - Completely fix the first pointless task switch.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      9005f3eb
    • Steven Rostedt's avatar
      trace, lockdep: manual preempt count adding for local_bh_disable · 7e49fcce
      Steven Rostedt authored
      
      
      Impact: fix to preempt trace triggering lockdep check_flag failure
      
      In local_bh_disable, the use of add_preempt_count causes the
      preempt tracer to start recording the time preemption is off.
      But because it already modified the preempt_count to show
      softirqs disabled, and before it called the lockdep code to
      handle this, it causes a state that lockdep can not handle.
      
      The preempt tracer will reset the ring buffer on start of a trace,
      and the ring buffer reset code does a spin_lock_irqsave. This
      calls into lockdep and lockdep will fail when it detects the
      invalid state of having softirqs disabled but the internal
      current->softirqs_enabled is still set.
      
      The fix is to manually add the SOFTIRQ_OFFSET to preempt count
      and call the preempt tracer code outside the lockdep critical
      area.
      
      Thanks to Peter Zijlstra for suggesting this solution.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      7e49fcce