1. 12 Dec, 2009 1 commit
  2. 30 Nov, 2009 2 commits
  3. 26 Nov, 2009 1 commit
    • Ilya Loginov's avatar
      block: add helpers to run flush_dcache_page() against a bio and a request's pages · 2d4dc890
      Ilya Loginov authored
      
      
      Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
      this causes problems on architectures where the icache doesn't fill from
      the dcache or with dcache aliases.  The patch fixes this.
      
      The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
      pointless empty cache-thrashing loops on architectures for which
      flush_dcache_page() is a no-op.  Every architecture was provided with this
      flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
      equal 1 or do nothing otherwise.
      
      See "fix mtd_blkdevs problem with caches on some architectures" discussion
      on LKML for more information.
      Signed-off-by: default avatarIlya Loginov <isloginov@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Peter Horton <phorton@bitbox.co.uk>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Signed-off-by: default avatarJens Axboe <jens.axboe@oracle.com>
      2d4dc890
  4. 25 Nov, 2009 1 commit
  5. 24 Nov, 2009 3 commits
    • Stuart Menefy's avatar
      sh: Minor optimisations to FPU handling · d3ea9fa0
      Stuart Menefy authored
      
      
      A number of small optimisations to FPU handling, in particular:
      
       - move the task USEDFPU flag from the thread_info flags field (which
         is accessed asynchronously to the thread) to a new status field,
         which is only accessed by the thread itself. This allows locking to
         be removed in most cases, or can be reduced to a preempt_lock().
         This mimics the i386 behaviour.
      
       - move the modification of regs->sr and thread_info->status flags out
         of save_fpu() to __unlazy_fpu(). This gives the compiler a better
         chance to optimise things, as well as making save_fpu() symmetrical
         with restore_fpu() and init_fpu().
      
       - implement prepare_to_copy(), so that when creating a thread, we can
         unlazy the FPU prior to copying the thread data structures.
      
      Also make sure that the FPU is disabled while in the kernel, in
      particular while booting, and for newly created kernel threads,
      
      In a very artificial benchmark, the execution time for 2500000
      context switches was reduced from 50 to 45 seconds.
      Signed-off-by: default avatarStuart Menefy <stuart.menefy@st.com>
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      d3ea9fa0
    • Stuart Menefy's avatar
      sh: Improve performance of SH4 versions of copy/clear_user_highpage · 39ac11c1
      Stuart Menefy authored
      
      
      The previous implementation of clear_user_highpage and copy_user_highpage
      checked to see if there was a D-cache aliasing issue between the user
      and kernel mappings of a page, but if there was they always did a
      flush with writeback on the dirtied kernel alias.
      
      However as we now have the ability to map a page into kernel space
      with the same cache colour as the user mapping, there is no need to
      write back this data.
      
      Currently we also invalidate the kernel alias as a precaution, however
      I'm not sure if this is actually required.
      
      Also correct the definition of FIX_CMAP_END so that the mappings created
      by kmap_coherent() are actually at the correct colour.
      Signed-off-by: default avatarStuart Menefy <stuart.menefy@st.com>
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      39ac11c1
    • Giuseppe CAVALLARO's avatar
      sh: add sleazy FPU optimization · a0458b07
      Giuseppe CAVALLARO authored
      
      
      sh port of the sLeAZY-fpu feature currently implemented for some architectures
      such us i386.
      
      Right now the SH kernel has a 100% lazy fpu behaviour.
      This is of course great for applications that have very sporadic or no FPU use.
      However for very frequent FPU users...  you take an extra trap every context
      switch.
      The patch below adds a simple heuristic to this code: after 5 consecutive
      context switches of FPU use, the lazy behavior is disabled and the context
      gets restored every context switch.
      After 256 switches, this is reset and the 100% lazy behavior is returned.
      
      Tests with LMbench showed no regression.
      I saw a little improvement due to the prefetching (~2%).
      
      The tests below also show that, with this sLeazy patch, indeed,
      the number of FPU exceptions is reduced.
      To test this. I hacked the lat_ctx LMBench to use the FPU a little more.
      
         sLeasy implementation
         ===========================================
         switch_to calls            |  79326
         sleasy   calls             |  42577
         do_fpu_state_restore  calls|  59232
         restore_fpu   calls        |  59032
      
         Exceptions:  0x800 (FPU disabled  ): 16604
      
         100% Leazy (default implementation)
         ===========================================
         switch_to  calls            |  79690
         do_fpu_state_restore calls  |  53299
         restore_fpu  calls          |   53101
      
         Exceptions: 0x800 (FPU disabled  ):  53273
      Signed-off-by: default avatarGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: default avatarStuart Menefy <stuart.menefy@st.com>
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      a0458b07
  6. 12 Nov, 2009 2 commits
  7. 09 Nov, 2009 1 commit
  8. 30 Oct, 2009 6 commits
  9. 28 Oct, 2009 1 commit
    • Paul Mundt's avatar
      sh: perf events: Add preliminary support for SH-4A counters. · ac44e669
      Paul Mundt authored
      
      
      This adds in preliminary support for the SH-4A performance counters.
      Presently only the first 2 counters are supported, as these are the ones
      of the most interest to the perf tool and end users. Counter chaining is
      not presently handled, so these are simply implemented as 32-bit
      counters.
      
      This also establishes a perf event support framework for other hardware
      counters, which the existing SH-4 oprofile code will migrate over to as
      the SH-4A support evolves.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      ac44e669
  10. 27 Oct, 2009 2 commits
  11. 26 Oct, 2009 2 commits
  12. 20 Oct, 2009 1 commit
  13. 18 Oct, 2009 1 commit
    • Paul Mundt's avatar
      sh: Fix up smp_mb__xxx() memory barriers for SH-4A SMP. · 1c8db713
      Paul Mundt authored
      
      
      In the past these were simply wrapping to barrier() which was sufficient
      on SH SMP platforms predating SH-4A. Unfortunately due to ll/sc semantics
      an explicit synco is needed in these cases, which is sorted for us by
      just switching these over to smp_mb(). smp_mb() also has the benefit of
      being wrapped to barrier() in the UP and non-SH4A cases, so old behaviour
      is maintained for those parts.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      1c8db713
  14. 17 Oct, 2009 1 commit
  15. 16 Oct, 2009 3 commits
    • Paul Mundt's avatar
      sh: Kill off legacy UBC wakeup cruft. · cae19b59
      Paul Mundt authored
      
      
      This code was added for some ancient SH-4 solution engines with peculiar
      boot ROMs that did silly things to the UBC MSTP bits. None of these have
      been in the wild for years, and these days the clock framework wraps up
      the MSTP bits, meaning that the UBC code is one of the few interfaces
      that is stomping MSTP bits underneath the clock framework. At this point
      the risks far outweigh any benefit this code provided, so just kill it
      off.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      cae19b59
    • Paul Mundt's avatar
      sh: Support SCHED_MC for SH-X3 multi-cores. · 896f0c0e
      Paul Mundt authored
      
      
      This enables SCHED_MC support for SH-X3 multi-cores. Presently this is
      just a simple wrapper around the possible map, but this allows for
      tying in support for some of the more exotic NUMA clusters where we can
      actually do something with the topology.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      896f0c0e
    • Paul Mundt's avatar
      sh: Idle loop chainsawing for SMP-based light sleep. · f533c3d3
      Paul Mundt authored
      
      
      This does a bit of chainsawing of the idle loop code to get light sleep
      working on SMP. Previously this was forcing secondary CPUs in to sleep
      mode with them not coming back if they didn't have their own local
      timers. Given that we use clockevents broadcasting by default, the CPU
      managing the clockevents can't have IRQs disabled before entering its
      sleep state.
      
      This unfortunately leaves us with the age-old need_resched() race in
      between local_irq_enable() and cpu_sleep(), but at present this is
      unavoidable. After some more experimentation it may be possible to layer
      on SR.BL bit manipulation over top of this scheme to inhibit the race
      condition, but given the current potential for missing wakeups, this is
      left as a future exercise.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      f533c3d3
  16. 14 Oct, 2009 3 commits
  17. 13 Oct, 2009 3 commits
  18. 11 Oct, 2009 3 commits
  19. 10 Oct, 2009 3 commits