1. 28 May, 2009 1 commit
  2. 12 Nov, 2008 3 commits
    • Steven Rostedt's avatar
      trace: rename unlikely profiler to branch profiler · 2ed84eeb
      Steven Rostedt authored
      
      
      Impact: name change of unlikely tracer and profiler
      
      Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
      to BRANCH_PROFILING. I never did like the "unlikely" name so I
      went one step farther, and renamed all the unlikely configurations
      to a "BRANCH" variant.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2ed84eeb
    • Ingo Molnar's avatar
      tracing: branch tracer, fix vdso crash · 2b7d0390
      Ingo Molnar authored
      
      
      Impact: fix bootup crash
      
      the branch tracer missed arch/x86/vdso/vclock_gettime.c from
      disabling tracing, which caused such bootup crashes:
      
        [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077
      
      also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
      creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
      instrumentation on a per file basis.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      2b7d0390
    • Steven Rostedt's avatar
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt authored
      
      
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: default avatarSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1f0d69a9
  3. 08 Nov, 2008 1 commit
  4. 08 Jul, 2008 1 commit
  5. 26 Jun, 2008 2 commits
  6. 23 May, 2008 1 commit
  7. 24 Apr, 2008 1 commit
  8. 29 Feb, 2008 1 commit
  9. 26 Feb, 2008 1 commit
    • Thomas Gleixner's avatar
      x86: fix vsyscall wreckage · ce28b986
      Thomas Gleixner authored
      
      
      based on a report from Arne Georg Gleditsch about user-space apps
      misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
      of the code revealed that the "NOP patching" done there is
      fundamentally unsafe for a number of reasons:
      
      1) the patching code runs without synchronizing other CPUs
      
      2) it inserts NOPs even if there is no clock source which provides vread
      
      3) when the clock source changes to one without vread we run in
         exactly the same problem as in #2
      
      4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
         the syscall is not patched out
      
      as a result it is possible to break user-space via this patching.
      The only safe thing for now is to remove the patching.
      
      This code was broken since v2.6.21.
      Reported-by: default avatarArne Georg Gleditsch <arne.gleditsch@dolphinics.no>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ce28b986
  10. 30 Jan, 2008 4 commits
  11. 19 Oct, 2007 3 commits
    • Simon Arlott's avatar
      spelling fixes: arch/x86_64/ · 676b1855
      Simon Arlott authored
      
      
      Spelling fixes in arch/x86_64/.
      Signed-off-by: default avatarSimon Arlott <simon@fire.lp0.eu>
      Signed-off-by: default avatarAdrian Bunk <bunk@kernel.org>
      676b1855
    • Mike Travis's avatar
      x86: convert cpuinfo_x86 array to a per_cpu array · 92cb7612
      Mike Travis authored
      
      
      cpu_data is currently an array defined using NR_CPUS.  This means that
      we overallocate since we will rarely really use maximum configured cpus.
      When NR_CPU count is raised to 4096 the size of cpu_data becomes
      3,145,728 bytes.
      
      These changes were adopted from the sparc64 (and ia64) code.  An
      additional field was added to cpuinfo_x86 to be a non-ambiguous cpu
      index.  This corresponds to the index into a cpumask_t as well as the
      per_cpu index.  It's used in various places like show_cpuinfo().
      
      cpu_data is defined to be the boot_cpu_data structure for the NON-SMP
      case.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Acked-by: default avatarChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Cc: Mark M. Hoffman <mhoffman@lightlink.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      92cb7612
    • Siddha, Suresh B's avatar
      x86, vsyscall: fix the oops crash with __pa_vsymbol() · 957ff882
      Siddha, Suresh B authored
      
      
      Appended patch fixes an oops while changing the vsyscall sysctl.
      I am sure no one tested this code before integrating into mainline :(
      
      BTW, using ioremap() in vsyscall_sysctl_change() to get the virtual
      address of a kernel symbol sounds like an over kill.. I wonder if we
      can define a simple __va_vsymbol() which will return directly the
      kernel direct mapping. comments in the code which says gcc has trouble
      with __va(__pa()) sounds bogus to me. __pa() on a vsyscall address will
      not work anyhow :(
      
      And also, the whole nop out syscall in vsyscall page infrastructure
      (vsyscall_sysctl_change()) is added to make some attacks difficult,
      and yet I don't see this nop out being done by default. This area
      requires more cleanups?
      
      Fix an oops with __pa_vsymbol(). VSYSCALL_FIRST_PAGE is a fixmap index.
      We want the starting virtual address of the vsyscall page and not the index.
      
      [ mingo: arch/x86 adaptation ]
      Reported-by: default avatarYanmin Zhang <yanmin.zhang@intel.com>
      Signed-off-by: default avatarSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      957ff882
  12. 18 Oct, 2007 3 commits
  13. 17 Oct, 2007 2 commits
    • Andi Kleen's avatar
      x86: remove duplicated vsyscall nsec update · c861eff8
      Andi Kleen authored
      
      
      Spotted by Chuck Ebbert
      
      [ tglx: arch/x86 adaptation ]
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      c861eff8
    • Mike Travis's avatar
      x86: fix cpu_to_node references · 98c9e27a
      Mike Travis authored
      
      
      In x86_64 and i386 architectures most arrays that are sized using
      NR_CPUS lay in local memory on node 0.  Not only will most (99%?) of the
      systems not use all the slots in these arrays, particularly when NR_CPUS
      is increased to accommodate future very high cpu count systems, but a
      number of cache lines are passed unnecessarily on the system bus when
      these arrays are referenced by cpus on other nodes.
      
      Typically, the values in these arrays are referenced by the cpu
      accessing it's own values, though when passing IPI interrupts, the cpu
      does access the data relevant to the targeted cpu/node.  Of course, if
      the referencing cpu is not on node 0, then the reference will still
      require cross node exchanges of cache lines.  A common use of this is
      for an interrupt service routine to pass the interrupt to other cpus
      local to that node.
      
      Ideally, all the elements in these arrays should be moved to the per_cpu
      data area.  In some cases (such as x86_cpu_to_apicid) the array is
      referenced before the per_cpu data areas are setup.  In this case, a
      static array is declared in the __initdata area and initialized by the
      booting cpu (BSP).  The values are then moved to the per_cpu area after
      it is initialized and the original static array is freed with the rest
      of the __initdata.
      
      This patch:
      
      Fix four instances where cpu_to_node is referenced by array instead of
      via the cpu_to_node macro.  This is preparation to moving it to the
      per_cpu data area.
      Signed-off-by: default avatarMike Travis <travis@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      98c9e27a
  14. 13 Oct, 2007 1 commit
    • Dave Jones's avatar
      Delete filenames in comments. · 835c34a1
      Dave Jones authored
      
      
      Since the x86 merge, lots of files that referenced their own filenames
      are no longer correct.  Rather than keep them up to date, just delete
      them, as they add no real value.
      
      Additionally:
      - fix up comment formatting in scx200_32.c
      - Remove a credit from myself in setup_64.c from a time when we had no SCM
      - remove longwinded history from tsc_32.c which can be figured out from
        git.
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      835c34a1
  15. 11 Oct, 2007 2 commits
  16. 22 Jul, 2007 1 commit
    • Andi Kleen's avatar
      x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu · 2aae950b
      Andi Kleen authored
      
      
      This implements new vDSO for x86-64.  The concept is similar
      to the existing vDSOs on i386 and PPC.  x86-64 has had static
      vsyscalls before,  but these are not flexible enough anymore.
      
      A vDSO is a ELF shared library supplied by the kernel that is mapped into
      user address space.  The vDSO mapping is randomized for each process
      for security reasons.
      
      Doing this was needed for clock_gettime, because clock_gettime
      always needs a syscall fallback and having one at a fixed
      address would have made buffer overflow exploits too easy to write.
      
      The vdso can be disabled with vdso=0
      
      It currently includes a new gettimeofday implemention and optimized
      clock_gettime(). The gettimeofday implementation is slightly faster
      than the one in the old vsyscall.  clock_gettime is significantly faster
      than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
      
      The new calls are generally faster than the old vsyscall.
      
      Advantages over the old x86-64 vsyscalls:
      - Extensible
      - Randomized
      - Cleaner
      - Easier to virtualize (the old static address range previously causes
      overhead e.g. for Xen because it has to create special page tables for it)
      
      Weak points:
      - glibc support still to be written
      
      The VM interface is partly based on Ingo Molnar's i386 version.
      
      Includes compile fix from Joachim Deguara
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2aae950b
  17. 21 May, 2007 1 commit
    • john stultz's avatar
      x86_64: vsyscall time() fix · d0aff6e6
      john stultz authored
      
      
      The vsyscall time() function basically returns the second portion of
      xtime directly.  This however means that there is about a ticks worth of
      time each second where time() will return a second value less then what
      gettimeofday() does.
      
      Additionally, this window where vtime() is behind vgettimeofday() grows
      when dynticks is enabled, so its probably good to get this in before
      dynticks lands.
      
      Big thanks to Sripathi for noticing this issue and creating a test case
      to work with!
      
      This patch changes the vtime() implemenation to call vgettimeofday(),
      much as syscall time() implementation calls gettimeofday().
      
      2.6.21 stable candidate too
      Signed-off-by: default avatarJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d0aff6e6
  18. 09 May, 2007 1 commit
    • Rafael J. Wysocki's avatar
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki authored
      
      
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: default avatarOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  19. 02 May, 2007 3 commits
    • Eric Dumazet's avatar
      [PATCH] x86-64: vsyscall_gtod_data diet and vgettimeofday() fix · c8118c6c
      Eric Dumazet authored
      
      
      Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer
      interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64)
      
      Instead of copying a whole struct clocksource, we copy only needed fields.
      
      I deleted an unused field : offset_base
      
      This patch fixes one oddity in vgettimeofday(): It can returns a timeval with
      tv_usec = 1000000. Maybe not a bug, but why not doing the right thing ?
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      c8118c6c
    • Eric Dumazet's avatar
      [PATCH] x86-64: fix vtime() vsyscall · 272a3713
      Eric Dumazet authored
      
      
      There is a tiny probability that the return value from vtime(time_t *t) is
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      
      different than the value stored in *t
      
      Using a temporary variable solves the problem and gives a faster code.
      
         17:   48 85 ff                test   %rdi,%rdi
         1a:   48 8b 05 00 00 00 00    mov    0(%rip),%rax        #
      __vsyscall_gtod_data.wall_time_tv.tv_sec
         21:   74 03                   je     26
         23:   48 89 07                mov    %rax,(%rdi)
         26:   c9                      leaveq
         27:   c3                      retq
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      272a3713
    • Vivek Goyal's avatar
      [PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028
      Vivek Goyal authored
      
      
      Currently __pa_symbol is for use with symbols in the kernel address
      map and __pa is for use with pointers into the physical memory map.
      But the code is implemented so you can usually interchange the two.
      
      __pa which is much more common can be implemented much more cheaply
      if it is it doesn't have to worry about any other kernel address
      spaces.  This is especially true with a relocatable kernel as
      __pa_symbol needs to peform an extra variable read to resolve
      the address.
      
      There is a third macro that is added for the vsyscall data
      __pa_vsymbol for finding the physical addesses of vsyscall pages.
      
      Most of this patch is simply sorting through the references to
      __pa or __pa_symbol and using the proper one.  A little of
      it is continuing to use a physical address when we have it
      instead of recalculating it several times.
      
      swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
      and init_mm.pgd is initialized at boot (instead of compile time)
      to the physmem virtual mapping of init_level4_pgd.  The
      physical address changed.
      
      Except for the for EMPTY_ZERO page all of the remaining references
      to __pa_symbol appear to be during kernel initialization.  So this
      should reduce the cost of __pa in the common case, even on a relocated
      kernel.
      
      As this is technically a semantic change we need to be on the lookout
      for anything I missed.  But it works for me (tm).
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      0dbf7028
  20. 14 Mar, 2007 1 commit
  21. 16 Feb, 2007 1 commit
  22. 14 Feb, 2007 2 commits
  23. 10 Dec, 2006 1 commit
  24. 07 Dec, 2006 2 commits