1. 21 Jan, 2011 1 commit
  2. 09 Dec, 2010 1 commit
    • Heiko Schocher's avatar
      powerpc/time: printk time stamp init not correct · 364a1246
      Heiko Schocher authored
      
      
      problem:
      
      I see sometimes on my mpc5200 based board such printk timing
      information:
      
      [    0.000000] NR_IRQS:512 nr_irqs:512 16
      [    0.000000] MPC52xx PIC is up and running!
      [    0.000000] clocksource: timebase mult[79364d9] shift[22] registered
      [    0.000000] console [ttyPSC0] enabled
      [  130.300633] pid_max: default: 32768 minimum: 301
      [  130.305647] Mount-cache hash table entries: 512
      [  130.315818] NET: Registered protocol family 16
      
      reason:
      if the tbu not starts from 0 when linux boots, boot_tb
      maybe could not store the real 64 bit tbu value, because
      boot_tp is only a 32 bit unsigned long.
      
      solution:
      change boot_tb to u64
      
      [BenH: Made it u64 instead of unsigned long long]
      
      Signed-off-by: Heiko Schocher's avatarHeiko Schocher <hs@denx.de>
      cc: Wolfgang Denk <wd@denx.de>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      364a1246
  3. 18 Oct, 2010 1 commit
    • Peter Zijlstra's avatar
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra authored
      
      
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: default avatarHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e360adbe
  4. 14 Oct, 2010 1 commit
  5. 02 Sep, 2010 2 commits
    • Paul Mackerras's avatar
      powerpc/pseries: Re-enable dispatch trace log userspace interface · 872e439a
      Paul Mackerras authored
      
      
      Since the cpu accounting code uses the hypervisor dispatch trace log
      now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled
      access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory
      in that case.  This restores those files.
      
      To do this, we now have a hook that the cpu accounting code will call
      as it processes each entry from the hypervisor dispatch trace log.
      The code in dtl.c now uses that to fill up its ring buffer, rather
      than having the hypervisor fill the ring buffer directly.
      
      This also fixes dtl_file_read() to handle overflow conditions a bit
      better and adds a spinlock to ensure that race conditions (multiple
      processes opening or reading the file concurrently) are handled
      correctly.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      872e439a
    • Paul Mackerras's avatar
      powerpc: Account time using timebase rather than PURR · cf9efce0
      Paul Mackerras authored
      
      
      Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
      PURR register for measuring the user and system time used by
      processes, as well as other related times such as hardirq and
      softirq times.  This turns out to be quite confusing for users
      because it means that a program will often be measured as taking
      less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
      than it does when run on a single-threaded processor (ST mode), even
      though the program takes longer to finish.  The discrepancy is
      accounted for as stolen time, which is also confusing, particularly
      when there are no other partitions running.
      
      This changes the accounting to use the timebase instead, meaning that
      the reported user and system times are the actual number of real-time
      seconds that the program was executing on the processor thread,
      regardless of which SMT mode the processor is in.  Thus a program will
      generally show greater user and system times when run on a
      multi-threaded processor than on a single-threaded processor.
      
      On pSeries systems on POWER5 or later processors, we measure the
      stolen time (time when this partition wasn't running) using the
      hypervisor dispatch trace log.  We check for new entries in the
      log on every entry from user mode and on every transition from
      kernel process context to soft or hard IRQ context (i.e. when
      account_system_vtime() gets called).  So that we can correctly
      distinguish time stolen from user time and time stolen from system
      time, without having to check the log on every exit to user mode,
      we store separate timestamps for exit to user mode and entry from
      user mode.
      
      On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
      in account_system_vtime() (as before), and then apportion the SPURR
      ticks since the last time we read it between scaled user time and
      scaled system time according to the relative proportions of user
      time and system time over the same interval.  This avoids having to
      read the SPURR on every kernel entry and exit.  On systems that have
      PURR but not SPURR (i.e., POWER5), we do the same using the PURR
      rather than the SPURR.
      
      This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
      for now since it conflicts with the use of the dispatch trace log
      by the time accounting code.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cf9efce0
  6. 31 Aug, 2010 1 commit
    • Paul Mackerras's avatar
      powerpc/perf_event: Reduce latency of calling perf_event_do_pending · b0d278b7
      Paul Mackerras authored
      Commit 0fe1ac48
      
       ("powerpc/perf_event: Fix oops due to
      perf_event_do_pending call") moved the call to perf_event_do_pending
      in timer_interrupt() down so that it was after the irq_enter() call.
      Unfortunately this moved it after the code that checks whether it
      is time for the next decrementer clock event.  The result is that
      the call to perf_event_do_pending() won't happen until the next
      decrementer clock event is due.  This was pointed out by Milton
      Miller.
      
      This fixes it by moving the check for whether it's time for the
      next decrementer clock event down to the point where we're about
      to call the event handler, after we've called perf_event_do_pending.
      
      This has the side effect that on old pre-Core99 Powermacs where we
      use the ppc_n_lost_interrupts mechanism to replay interrupts, a
      replayed interrupt will incur a little more latency since it will
      now do the code from the irq_enter down to the irq_exit, that it
      used to skip.  However, these machines are now old and rare enough
      that this doesn't matter.  To make it clear that ppc_n_lost_interrupts
      is only used on Powermacs, and to speed up the code slightly on
      non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts
      is now conditional on CONFIG_PMAC as well as CONFIG_PPC32.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b0d278b7
  7. 28 Jul, 2010 2 commits
    • Paul Mackerras's avatar
      powerpc: Clean up obsolete code relating to decrementer and timebase · d75d68cf
      Paul Mackerras authored
      
      
      Since the decrementer and timekeeping code was moved over to using
      the generic clockevents and timekeeping infrastructure, several
      variables and functions have been obsolete and effectively unused.
      This deletes them.
      
      In particular, wakeup_decrementer() is no longer needed since the
      generic code reprograms the decrementer as part of the process of
      resuming the timekeeping code, which happens during sysdev resume.
      Thus the wakeup_decrementer calls in the suspend_enter methods for
      52xx platforms have been removed.  The call in the powermac cpu
      frequency change code has been replaced by set_dec(1), which will
      cause a timer interrupt as soon as interrupts are enabled, and the
      generic code will then reprogram the decrementer with the correct
      value.
      
      This also simplifies the generic_suspend_en/disable_irqs functions
      and makes them static since they are not referenced outside time.c.
      The preempt_enable/disable calls are removed because the generic
      code has disabled all but the boot cpu at the point where these
      functions are called, so we can't be moved to another cpu.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d75d68cf
    • Paul Mackerras's avatar
      powerpc: Rework VDSO gettimeofday to prevent time going backwards · 0e469db8
      Paul Mackerras authored
      
      
      Currently it is possible for userspace to see the result of
      gettimeofday() going backwards by 1 microsecond, assuming that
      userspace is using the gettimeofday() in the VDSO.  The VDSO
      gettimeofday() algorithm computes the time in "xsecs", which are
      units of 2^-20 seconds, or approximately 0.954 microseconds,
      using the algorithm
      
      	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec
      
      and then converts the time in xsecs to seconds and microseconds.
      
      The kernel updates the tb_orig_stamp and stamp_xsec values every
      tick in update_vsyscall().  If the length of the tick is not an
      integer number of xsecs, then some precision is lost in converting
      the current time to xsecs.  For example, with CONFIG_HZ=1000, the
      tick is 1ms long, which is 1048.576 xsecs.  That means that
      stamp_xsec will advance by either 1048 or 1049 on each tick.
      With the right conditions, it is possible for userspace to get
      (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
      slightly late in updating the vdso_datapage, and then for stamp_xsec
      to advance by 1048 when the kernel does update it, and for userspace
      to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
      integer truncation.  The result is that time appears to go backwards
      by 1 microsecond.
      
      To fix this we change the VDSO gettimeofday to use a new field in the
      VDSO datapage which stores the nanoseconds part of the time as a
      fractional number of seconds in a 0.32 binary fraction format.
      (Or put another way, as a 32-bit number in units of 0.23283 ns.)
      This is convenient because we can use the mulhwu instruction to
      convert it to either microseconds or nanoseconds.
      
      Since it turns out that computing the time of day using this new field
      is simpler than either using stamp_xsec (as gettimeofday does) or
      stamp_xtime.tv_nsec (as clock_gettime does), this converts both
      gettimeofday and clock_gettime to use the new field.  The existing
      __do_get_tspec function is converted to use the new field and take
      a parameter in r7 that indicates the desired resolution, 1,000,000
      for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
      function is then unused and is deleted.
      
      The new algorithm is
      
      	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
      		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction
      
      with 'now' in units of 2^-32 seconds.  That is then converted to
      seconds and either microseconds or nanoseconds with
      
      	seconds = now >> 32
      	partseconds = ((now & 0xffffffff) * resolution) >> 32
      
      The 32-bit VDSO code also makes a further simplification: it ignores
      the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
      fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
      a timebase frequency of 1GHz or less and an update interval of no
      more than 10ms, the upper 32 bits of tb_to_xs will be at least
      4503599, so the error from ignoring the low 32 bits will be at most
      2.2ns, which is more than an order of magnitude less than the time
      taken to do gettimeofday or clock_gettime on our fastest processors,
      so there is no possibility of seeing inconsistent values due to this.
      
      This also moves update_gtod() down next to its only caller, and makes
      update_vsyscall use the time passed in via the wall_time argument rather
      than accessing xtime directly.  At present, wall_time always points to
      xtime, but that could change in future.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0e469db8
  8. 27 Jul, 2010 3 commits
    • John Stultz's avatar
      timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset · 7615856e
      John Stultz authored
      
      
      update_vsyscall() did not provide the wall_to_monotoinc offset,
      so arch specific implementations tend to reference wall_to_monotonic
      directly. This limits future cleanups in the timekeeping core, so
      this patch fixes the update_vsyscall interface to provide
      wall_to_monotonic, allowing wall_to_monotonic to be made static
      as planned in Documentation/feature-removal-schedule.txt
      
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1279068988-21864-7-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      7615856e
    • John Stultz's avatar
      powerpc: Cleanup xtime usage · 06d518e3
      John Stultz authored
      
      
      This removes powerpc's direct xtime usage, allowing for further
      generic timeekeping cleanups
      
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      LKML-Reference: <1279068988-21864-6-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      06d518e3
    • John Stultz's avatar
      powerpc: Simplify update_vsyscall · b0797b60
      John Stultz authored
      
      
      Currently powerpc's update_vsyscall calls an inline update_gtod.
      However, both are straightforward, and there are no other users,
      so this patch merges update_gtod into update_vsyscall.
      
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1279068988-21864-5-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      b0797b60
  9. 09 Jul, 2010 2 commits
    • Paul Mackerras's avatar
      powerpc: Clean up obsolete code relating to decrementer and timebase · c1aa687d
      Paul Mackerras authored
      
      
      Since the decrementer and timekeeping code was moved over to using
      the generic clockevents and timekeeping infrastructure, several
      variables and functions have been obsolete and effectively unused.
      This deletes them.
      
      In particular, wakeup_decrementer() is no longer needed since the
      generic code reprograms the decrementer as part of the process of
      resuming the timekeeping code, which happens during sysdev resume.
      Thus the wakeup_decrementer calls in the suspend_enter methods for
      52xx platforms have been removed.  The call in the powermac cpu
      frequency change code has been replaced by set_dec(1), which will
      cause a timer interrupt as soon as interrupts are enabled, and the
      generic code will then reprogram the decrementer with the correct
      value.
      
      This also simplifies the generic_suspend_en/disable_irqs functions
      and makes them static since they are not referenced outside time.c.
      The preempt_enable/disable calls are removed because the generic
      code has disabled all but the boot cpu at the point where these
      functions are called, so we can't be moved to another cpu.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c1aa687d
    • Paul Mackerras's avatar
      powerpc: Rework VDSO gettimeofday to prevent time going backwards · 8fd63a9e
      Paul Mackerras authored
      
      
      Currently it is possible for userspace to see the result of
      gettimeofday() going backwards by 1 microsecond, assuming that
      userspace is using the gettimeofday() in the VDSO.  The VDSO
      gettimeofday() algorithm computes the time in "xsecs", which are
      units of 2^-20 seconds, or approximately 0.954 microseconds,
      using the algorithm
      
      	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec
      
      and then converts the time in xsecs to seconds and microseconds.
      
      The kernel updates the tb_orig_stamp and stamp_xsec values every
      tick in update_vsyscall().  If the length of the tick is not an
      integer number of xsecs, then some precision is lost in converting
      the current time to xsecs.  For example, with CONFIG_HZ=1000, the
      tick is 1ms long, which is 1048.576 xsecs.  That means that
      stamp_xsec will advance by either 1048 or 1049 on each tick.
      With the right conditions, it is possible for userspace to get
      (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
      slightly late in updating the vdso_datapage, and then for stamp_xsec
      to advance by 1048 when the kernel does update it, and for userspace
      to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
      integer truncation.  The result is that time appears to go backwards
      by 1 microsecond.
      
      To fix this we change the VDSO gettimeofday to use a new field in the
      VDSO datapage which stores the nanoseconds part of the time as a
      fractional number of seconds in a 0.32 binary fraction format.
      (Or put another way, as a 32-bit number in units of 0.23283 ns.)
      This is convenient because we can use the mulhwu instruction to
      convert it to either microseconds or nanoseconds.
      
      Since it turns out that computing the time of day using this new field
      is simpler than either using stamp_xsec (as gettimeofday does) or
      stamp_xtime.tv_nsec (as clock_gettime does), this converts both
      gettimeofday and clock_gettime to use the new field.  The existing
      __do_get_tspec function is converted to use the new field and take
      a parameter in r7 that indicates the desired resolution, 1,000,000
      for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
      function is then unused and is deleted.
      
      The new algorithm is
      
      	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
      		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction
      
      with 'now' in units of 2^-32 seconds.  That is then converted to
      seconds and either microseconds or nanoseconds with
      
      	seconds = now >> 32
      	partseconds = ((now & 0xffffffff) * resolution) >> 32
      
      The 32-bit VDSO code also makes a further simplification: it ignores
      the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
      fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
      a timebase frequency of 1GHz or less and an update interval of no
      more than 10ms, the upper 32 bits of tb_to_xs will be at least
      4503599, so the error from ignoring the low 32 bits will be at most
      2.2ns, which is more than an order of magnitude less than the time
      taken to do gettimeofday or clock_gettime on our fastest processors,
      so there is no possibility of seeing inconsistent values due to this.
      
      This also moves update_gtod() down next to its only caller, and makes
      update_vsyscall use the time passed in via the wall_time argument rather
      than accessing xtime directly.  At present, wall_time always points to
      xtime, but that could change in future.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8fd63a9e
  10. 12 May, 2010 1 commit
    • Paul Mackerras's avatar
      powerpc/perf_event: Fix oops due to perf_event_do_pending call · 0fe1ac48
      Paul Mackerras authored
      
      
      Anton Blanchard found that large POWER systems would occasionally
      crash in the exception exit path when profiling with perf_events.
      The symptom was that an interrupt would occur late in the exit path
      when the MSR[RI] (recoverable interrupt) bit was clear.  Interrupts
      should be hard-disabled at this point but they were enabled.  Because
      the interrupt was not recoverable the system panicked.
      
      The reason is that the exception exit path was calling
      perf_event_do_pending after hard-disabling interrupts, and
      perf_event_do_pending will re-enable interrupts.
      
      The simplest and cleanest fix for this is to use the same mechanism
      that 32-bit powerpc does, namely to cause a self-IPI by setting the
      decrementer to 1.  This means we can remove the tests in the exception
      exit path and raw_local_irq_restore.
      
      This also makes sure that the call to perf_event_do_pending from
      timer_interrupt() happens within irq_enter/irq_exit.  (Note that
      calling perf_event_do_pending from timer_interrupt does not mean that
      there is a possible 1/HZ latency; setting the decrementer to 1 ensures
      that the timer interrupt will happen immediately, i.e. within one
      timebase tick, which is a few nanoseconds or 10s of nanoseconds.)
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0fe1ac48
  11. 17 Feb, 2010 1 commit
  12. 09 Feb, 2010 1 commit
  13. 03 Feb, 2010 1 commit
  14. 15 Jan, 2010 1 commit
    • Stefan Roese's avatar
      powerpc: Fix decrementer setup on 1GHz boards · 3e7b4843
      Stefan Roese authored
      We noticed that recent kernels didn't boot on our 1GHz Canyonlands 460EX
      boards anymore. As it seems, patch 8d165db1
      
       [powerpc: Improve
      decrementer accuracy] introduced this problem. The routine div_sc()
      overflows with shift = 32 resulting in this incorrect setup:
      
      time_init: decrementer frequency = 1000.000012 MHz
      time_init: processor frequency   = 1000.000012 MHz
      clocksource: timebase mult[400000] shift[22] registered
      clockevent: decrementer mult[33] shift[32] cpu[0]
      
      This patch now introduces a local div_dc64() version of this function
      so that this overflow doesn't happen anymore.
      
      Signed-off-by: Stefan Roese's avatarStefan Roese <sr@denx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Detlev Zundel <dzu@denx.de>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3e7b4843
  15. 17 Nov, 2009 1 commit
    • Lin Ming's avatar
      timekeeping: Fix clock_gettime vsyscall time warp · 0696b711
      Lin Ming authored
      Since commit 0a544198
      
       "timekeeping: Move NTP adjusted clock multiplier
      to struct timekeeper" the clock multiplier of vsyscall is updated with
      the unmodified clock multiplier of the clock source and not with the
      NTP adjusted multiplier of the timekeeper.
      
      This causes user space observerable time warps:
      new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf
      
      Add a new argument "mult" to update_vsyscall() and hand in the
      timekeeping internal NTP adjusted multiplier.
      
      Signed-off-by: default avatarLin Ming <ming.m.lin@intel.com>
      Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      0696b711
  16. 13 Nov, 2009 1 commit
    • Thomas Gleixner's avatar
      clocksource/events: Fix fallout of generic code changes · a362c638
      Thomas Gleixner authored
      
      
      powerpc grew a new warning due to the type change of clockevent->mult.
      
      The architectures which use parts of the generic time keeping
      infrastructure tripped over my wrong assumption that
      clocksource_register is only used when GENERIC_TIME=y.
      
      I should have looked and also I should have known better. These
      renitent Gaul villages are racking my nerves. Some serious deprecating
      is due.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a362c638
  17. 05 Nov, 2009 2 commits
  18. 28 Oct, 2009 1 commit
    • Anton Blanchard's avatar
      powerpc: tracing: Add powerpc tracepoints for timer entry and exit · 6795b85c
      Anton Blanchard authored
      
      
      We can monitor the effectiveness of our power management of both the
      kernel and hypervisor by probing the timer interrupt. For example, on
      this box we see 10.37s timer interrupts on an idle core:
      
      <idle>-0     [010]  3900.671297: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
      <idle>-0     [010]  3900.671302: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
      
      <idle>-0     [010]  3911.042963: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
      <idle>-0     [010]  3911.042968: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
      
      <idle>-0     [010]  3921.414630: timer_interrupt_entry: pt_regs=c0000000ce1e7b10
      <idle>-0     [010]  3921.414635: timer_interrupt_exit: pt_regs=c0000000ce1e7b10
      
      Since we have a 207MHz decrementer it will go negative and fire every 10.37s
      even if Linux is completely idle.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      6795b85c
  19. 21 Sep, 2009 1 commit
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
  20. 28 Aug, 2009 1 commit
  21. 23 Aug, 2009 1 commit
  22. 20 Aug, 2009 1 commit
  23. 15 Aug, 2009 1 commit
  24. 03 Aug, 2009 1 commit
  25. 18 Jun, 2009 1 commit
    • Paul Mackerras's avatar
      perf_counter: powerpc: Enable use of software counters on 32-bit powerpc · 105988c0
      Paul Mackerras authored
      
      
      This enables the perf_counter subsystem on 32-bit powerpc.  Since we
      don't have any support for hardware counters on 32-bit powerpc yet,
      only software counters can be used.
      
      Besides selecting HAVE_PERF_COUNTERS for 32-bit powerpc as well as
      64-bit, the main thing this does is add an implementation of
      set_perf_counter_pending().  This needs to arrange for
      perf_counter_do_pending() to be called when interrupts are enabled.
      Rather than add code to local_irq_restore as 64-bit does, the 32-bit
      set_perf_counter_pending() generates an interrupt by setting the
      decrementer to 1 so that a decrementer interrupt will become pending
      in 1 or 2 timebase ticks (if a decrementer interrupt isn't already
      pending).  When interrupts are enabled, timer_interrupt() will be
      called, and some new code in there calls perf_counter_do_pending().
      We use a per-cpu array of flags to indicate whether we need to call
      perf_counter_do_pending() or not.
      
      This introduces a couple of new Kconfig symbols: PPC_HAVE_PMU_SUPPORT,
      which is selected by processor families for which we have hardware PMU
      support (currently only PPC64), and PPC_PERF_CTRS, which enables the
      powerpc-specific perf_counter back-end.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linuxppc-dev@ozlabs.org
      Cc: benh@kernel.crashing.org
      LKML-Reference: <19000.55404.103840.393470@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      105988c0
  26. 15 Jun, 2009 1 commit
  27. 21 May, 2009 1 commit
    • Anton Blanchard's avatar
      powerpc: Improve decrementer accuracy · 8d165db1
      Anton Blanchard authored
      
      
      I have been looking at sources of OS jitter and notice that after a long
      NO_HZ idle period we wakeup too early:
      
      relative time (us)    event
                            timer irq exit
          999946.405        timer irq entry
               4.835        timer irq exit
              21.685        timer irq entry
               3.540          timer (tick_sched_timer) entry
      
      Here we slept for just under a second then took a timer interrupt that did
      nothing. 21.685 us later we wake up again and do the work.
      
      We set a rather low shift value of 16 for the decrementer clockevent, which I
      think is causing this issue. On this box we have a 207MHz decrementer and see:
      
      clockevent: decrementer mult[3501] shift[16] cpu[0]
      
      For calculations of large intervals this mult/shift combination could be
      off by a significant amount. I notice the sparc code has a loop that iterates
      to find a mult/shift combination that maximises the shift value while
      keeping mult under 32bit. With the patch below we get:
      
      clockevent: decrementer mult[35015c20] shift[32] cpu[15]
      
      And we no longer see the spurious wakeups.
      
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d165db1
  28. 21 Apr, 2009 1 commit
  29. 02 Apr, 2009 1 commit
  30. 31 Dec, 2008 2 commits
    • Martin Schwidefsky's avatar
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky authored
      
      
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
    • Martin Schwidefsky's avatar
      [PATCH] fix scaled & unscaled cputime accounting · 457533a7
      Martin Schwidefsky authored
      
      
      The utimescaled / stimescaled fields in the task structure and the
      global cpustat should be set on all architectures. On s390 the calls
      to account_user_time_scaled and account_system_time_scaled never have
      been added. In addition system time that is accounted as guest time
      to the user time of a process is accounted to the scaled system time
      instead of the scaled user time.
      To fix the bugs and to prevent future forgetfulness this patch merges
      account_system_time_scaled into account_system_time and
      account_user_time_scaled into account_user_time.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      457533a7
  31. 13 Dec, 2008 1 commit
  32. 05 Nov, 2008 2 commits
    • Paul Mackerras's avatar
      powerpc: Eliminate unused do_gtod variable · 3cc69878
      Paul Mackerras authored
      
      
      Since we started using the generic timekeeping code, we haven't had a
      powerpc-specific version of do_gettimeofday, and hence there is now
      nothing that reads the do_gtod variable in arch/powerpc/kernel/time.c.
      This therefore removes it and the code that sets it.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      3cc69878
    • Paul Mackerras's avatar
      powerpc: Improve resolution of VDSO clock_gettime · 597bc5c0
      Paul Mackerras authored
      
      
      Currently the clock_gettime implementation in the VDSO produces a
      result with microsecond resolution for the cases that are handled
      without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC.  The
      nanoseconds field of the result is obtained by computing a
      microseconds value and multiplying by 1000.
      
      This changes the code in the VDSO to do the computation for
      clock_gettime with nanosecond resolution.  That means that the
      resolution of the result will ultimately depend on the timebase
      frequency.
      
      Because the timestamp in the VDSO datapage (stamp_xsec, the real time
      corresponding to the timebase count in tb_orig_stamp) is in units of
      2^-20 seconds, it doesn't have sufficient resolution for computing a
      result with nanosecond resolution.  Therefore this adds a copy of
      xtime to the VDSO datapage and updates it in update_gtod() along with
      the other time-related fields.
      
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      597bc5c0