1. 23 Jun, 2006 5 commits
  2. 22 Jun, 2006 1 commit
    • Richard Purdie's avatar
      [PATCH] zlib_inflate: Upgrade library code to a recent version · 4f3865fb
      Richard Purdie authored
      Upgrade the zlib_inflate implementation in the kernel from a patched
      version 1.1.3/4 to a patched 1.2.3.
      
      The code in the kernel is about seven years old and I noticed that the
      external zlib library's inflate performance was significantly faster (~50%)
      than the code in the kernel on ARM (and faster again on x86_32).
      
      For comparison the newer deflate code is 20% slower on ARM and 50% slower
      on x86_32 but gives an approx 1% compression ratio improvement.  I don't
      consider this to be an improvement for kernel use so have no plans to
      change the zlib_deflate code.
      
      Various changes have been made to the zlib code in the kernel, the most
      significant being the extra functions/flush option used by ppp_deflate.
      This update reimplements the features PPP needs to ensure it continues to
      work.
      
      This code has been tested on ARM under both JFFS2 (with zlib compression
      enabled) and ppp_deflate and on x86_32.  JFFS2 sees an approx.  10% real
      world file read speed improvement.
      
      This patch also removes ZLIB_VERSION as it no longer has a correct value.
      We don't need version checks anyway as the kernel's module handling will
      take care of that for us.  This removal is also more in keeping with the
      zlib author's wishes (http://www.zlib.net/zlib_faq.html#faq24
      
      ) and I've
      added something to the zlib.h header to note its a modified version.
      Signed-off-by: default avatarRichard Purdie <rpurdie@rpsys.net>
      Acked-by: default avatarJoern Engel <joern@wh.fh-wedel.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      4f3865fb
  3. 21 Jun, 2006 2 commits
  4. 05 Jun, 2006 1 commit
  5. 21 May, 2006 1 commit
  6. 12 May, 2006 1 commit
  7. 27 Apr, 2006 2 commits
  8. 21 Apr, 2006 2 commits
    • David Woodhouse's avatar
      [RBTREE] Merge colour and parent fields of struct rb_node. · 55a98102
      David Woodhouse authored
      
      
      We only used a single bit for colour information, so having a whole
      machine word of space allocated for it was a bit wasteful. Instead,
      store it in the lowest bit of the 'parent' pointer, since that was
      always going to be aligned anyway.
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      55a98102
    • David Woodhouse's avatar
      [RBTREE] Remove dead code in rb_erase() · 1975e593
      David Woodhouse authored
      
      
      Observe rb_erase(), when the victim node 'old' has two children so
      neither of the simple cases at the beginning are taken.
      
      Observe that it effectively does an 'rb_next()' operation to find the
      next (by value) node in the tree. That is; we go to the victim's
      right-hand child and then follow left-hand pointers all the way
      down the tree as far as we can until we find the next node 'node'. We
      end up with 'node' being either the same immediate right-hand child of
      'old', or one of its descendants on the far left-hand side.
      
      For a start, we _know_ that 'node' has a parent. We can drop that check.
      
      We also know that if 'node's parent is 'old', then 'node' is the
      right-hand child of its parent. And that if 'node's parent is _not_
      'old', then 'node' is the left-hand child of its parent.
      
      So instead of checking for 'node->rb_parent == old' in one place and
      also checking 'node's heritage separately when we're trying to change
      its link from its parent, we can shuffle things around a bit and do
      it like this...
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      1975e593
  9. 19 Apr, 2006 1 commit
    • Tim Chen's avatar
      [PATCH] Kconfig.debug: Set DEBUG_MUTEX to off by default · cca57c5b
      Tim Chen authored
      
      
      DEBUG_MUTEX flag is on by default in current kernel configuration.
      
      During performance testing, we saw mutex debug functions like
      mutex_debug_check_no_locks_freed (called by kfree()) is expensive as it
      goes through a global list of memory areas with mutex lock and do the
      checking.  For benchmarks such as Volanomark and Hackbench, we have seen
      more than 40% drop in performance on some platforms.  We suggest to set
      DEBUG_MUTEX off by default.  Or at least do that later when we feel that
      the mutex changes in the current code have stabilized.
      Signed-off-by: default avatarTim Chen <tim.c.chen@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      cca57c5b
  10. 14 Apr, 2006 1 commit
    • NeilBrown's avatar
      [PATCH] sysfs: Allow sysfs attribute files to be pollable · 4508a7a7
      NeilBrown authored
      
      
      It works like this:
        Open the file
        Read all the contents.
        Call poll requesting POLLERR or POLLPRI (so select/exceptfds works)
        When poll returns,
           close the file and go to top of loop.
         or lseek to start of file and go back to the 'read'.
      
      Events are signaled by an object manager calling
         sysfs_notify(kobj, dir, attr);
      
      If the dir is non-NULL, it is used to find a subdirectory which
      contains the attribute (presumably created by sysfs_create_group).
      
      This has a cost of one int  per attribute, one wait_queuehead per kobject,
      one int per open file.
      
      The name "sysfs_notify" may be confused with the inotify
      functionality.  Maybe it would be nice to support inotify for sysfs
      attributes as well?
      
      This patch also uses sysfs_notify to allow /sys/block/md*/md/sync_action
      to be pollable
      Signed-off-by: default avatarNeil Brown <neilb@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4508a7a7
  11. 11 Apr, 2006 3 commits
  12. 30 Mar, 2006 1 commit
  13. 27 Mar, 2006 1 commit
  14. 26 Mar, 2006 5 commits
    • Akinobu Mita's avatar
      [PATCH] bitops: hweight() speedup · f9b41929
      Akinobu Mita authored
      <linux@horizon.com> wrote:
      
      This is an extremely well-known technique.  You can see a similar version that
      uses a multiply for the last few steps at
      http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel whch
      refers to "Software Optimization Guide for AMD Athlon 64 and Opteron
      Processors"
      http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/25112.PDF
      
      
      
      It's section 8.6, "Efficient Implementation of Population-Count Function in
      32-bit Mode", pages 179-180.
      
      It uses the name that I am more familiar with, "popcount" (population count),
      although "Hamming weight" also makes sense.
      
      Anyway, the proof of correctness proceeds as follows:
      
      	b = a - ((a >> 1) & 0x55555555);
      	c = (b & 0x33333333) + ((b >> 2) & 0x33333333);
      	d = (c + (c >> 4)) & 0x0f0f0f0f;
      #if SLOW_MULTIPLY
      	e = d + (d >> 8)
      	f = e + (e >> 16);
      	return f & 63;
      #else
      	/* Useful if multiply takes at most 4 cycles */
      	return (d * 0x01010101) >> 24;
      #endif
      
      The input value a can be thought of as 32 1-bit fields each holding their own
      hamming weight.  Now look at it as 16 2-bit fields.  Each 2-bit field a1..a0
      has the value 2*a1 + a0.  This can be converted into the hamming weight of the
      2-bit field a1+a0 by subtracting a1.
      
      That's what the (a >> 1) & mask subtraction does.  Since there can be no
      borrows, you can just do it all at once.
      
      Enumerating the 4 possible cases:
      
      0b00 = 0  ->  0 - 0 = 0
      0b01 = 1  ->  1 - 0 = 1
      0b10 = 2  ->  2 - 1 = 1
      0b11 = 3  ->  3 - 1 = 2
      
      The next step consists of breaking up b (made of 16 2-bir fields) into
      even and odd halves and adding them into 4-bit fields.  Since the largest
      possible sum is 2+2 = 4, which will not fit into a 4-bit field, the 2-bit
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                "which will not fit into a 2-bit field"
      
      fields have to be masked before they are added.
      
      After this point, the masking can be delayed.  Each 4-bit field holds a
      population count from 0..4, taking at most 3 bits.  These numbers can be added
      without overflowing a 4-bit field, so we can compute c + (c >> 4), and only
      then mask off the unwanted bits.
      
      This produces d, a number of 4 8-bit fields, each in the range 0..8.  From
      this point, we can shift and add d multiple times without overflowing an 8-bit
      field, and only do a final mask at the end.
      
      The number to mask with has to be at least 63 (so that 32 on't be truncated),
      but can also be 128 or 255.  The x86 has a special encoding for signed
      immediate byte values -128..127, so the value of 255 is slower.  On other
      processors, a special "sign extend byte" instruction might be faster.
      
      On a processor with fast integer multiplies (Athlon but not P4), you can
      reduce the final few serially dependent instructions to a single integer
      multiply.  Consider d to be 3 8-bit values d3, d2, d1 and d0, each in the
      range 0..8.  The multiply forms the partial products:
      
      	           d3 d2 d1 d0
      	        d3 d2 d1 d0
      	     d3 d2 d1 d0
      	+ d3 d2 d1 d0
      	----------------------
      	           e3 e2 e1 e0
      
      Where e3 = d3 + d2 + d1 + d0.   e2, e1 and e0 obviously cannot generate
      any carries.
      Signed-off-by: default avatarAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f9b41929
    • Akinobu Mita's avatar
      [PATCH] bitops: hweight() related cleanup · 37d54111
      Akinobu Mita authored
      
      
      By defining generic hweight*() routines
      
      - hweight64() will be defined on all architectures
      - hweight_long() will use architecture optimized hweight32() or hweight64()
      
      I found two possible cleanups by these reasons.
      Signed-off-by: default avatarAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      37d54111
    • Akinobu Mita's avatar
      [PATCH] bitops: generic ext2_{set,clear,test,find_first_zero,find_next_zero}_bit() · 930ae745
      Akinobu Mita authored
      
      
      This patch introduces the C-language equivalents of the functions below:
      
      int ext2_set_bit(int nr, volatile unsigned long *addr);
      int ext2_clear_bit(int nr, volatile unsigned long *addr);
      int ext2_test_bit(int nr, const volatile unsigned long *addr);
      unsigned long ext2_find_first_zero_bit(const unsigned long *addr,
                                             unsigned long size);
      unsinged long ext2_find_next_zero_bit(const unsigned long *addr,
                                            unsigned long size);
      
      In include/asm-generic/bitops/ext2-non-atomic.h
      
      This code largely copied from:
      
      include/asm-powerpc/bitops.h
      include/asm-parisc/bitops.h
      Signed-off-by: default avatarAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      930ae745
    • Akinobu Mita's avatar
      [PATCH] bitops: generic hweight{64,32,16,8}() · 3b9ed1a5
      Akinobu Mita authored
      
      
      This patch introduces the C-language equivalents of the functions below:
      
      unsigned int hweight32(unsigned int w);
      unsigned int hweight16(unsigned int w);
      unsigned int hweight8(unsigned int w);
      unsigned long hweight64(__u64 w);
      
      In include/asm-generic/bitops/hweight.h
      
      This code largely copied from: include/linux/bitops.h
      Signed-off-by: default avatarAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3b9ed1a5
    • Akinobu Mita's avatar
      [PATCH] bitops: generic find_{next,first}{,_zero}_bit() · c7f612cd
      Akinobu Mita authored
      
      
      This patch introduces the C-language equivalents of the functions below:
      
      unsigned logn find_next_bit(const unsigned long *addr, unsigned long size,
                                  unsigned long offset);
      unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
                                       unsigned long offset);
      unsigned long find_first_zero_bit(const unsigned long *addr,
                                        unsigned long size);
      unsigned long find_first_bit(const unsigned long *addr, unsigned long size);
      
      In include/asm-generic/bitops/find.h
      
      This code largely copied from: arch/powerpc/lib/bitops.c
      Signed-off-by: default avatarAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c7f612cd
  15. 25 Mar, 2006 8 commits
  16. 24 Mar, 2006 5 commits
    • Eric Sesterhenn's avatar
      BUG_ON() Conversion in lib/swiotlb.c · 34814545
      Eric Sesterhenn authored
      
      
      this changes if() BUG(); constructs to BUG_ON() which is
      cleaner, contains unlikely() and can better optimized away.
      Signed-off-by: default avatarEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      34814545
    • Jan Beulich's avatar
      [PATCH] CONFIG_UNWIND_INFO · 604bf5a2
      Jan Beulich authored
      
      
      As a foundation for reliable stack unwinding, this adds a config option
      (available to all architectures except IA64 and those where the module
      loader might have problems with the resulting relocations) to enable the
      generation of frame unwind information.
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Paul Mundt <lethal@linux-sh.org>,
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      604bf5a2
    • Paul Jackson's avatar
      [PATCH] bitmap: region restructuring · 3cf64b93
      Paul Jackson authored
      
      
      Restructure the bitmap_*_region() operations, to avoid code duplication.
      
      Also reduces binary text size by about 100 bytes (ia64 arch).  The original
      Bottomley bitmap_*_region patch added about 1000 bytes of compiled kernel text
      (ia64).  The Mundt multiword extension added another 600 bytes, and this
      restructuring patch gets back about 100 bytes.
      
      But the real motivation was the reduced amount of duplicated code.
      
      Tested by Paul Mundt using <= BITS_PER_LONG as well as power of
      2 aligned multiword spanning allocations.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      3cf64b93
    • Paul Mundt's avatar
      [PATCH] bitmap: region multiword spanning support · 74373c6a
      Paul Mundt authored
      
      
      Add support to the lib/bitmap.c bitmap_*_region() routines
      
      For bitmap regions larger than one word (nbits > BITS_PER_LONG).  This removes
      a BUG_ON() in lib bitmap.
      
      I have an updated store queue API for SH that is currently using this with
      relative success, and at first glance, it seems like this could be useful for
      x86 (arch/i386/kernel/pci-dma.c) as well.  Particularly for anything using
      dma_declare_coherent_memory() on large areas and that attempts to allocate
      large buffers from that space.
      
      Paul Jackson also did some cleanup to this patch.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      74373c6a
    • Paul Jackson's avatar
      [PATCH] bitmap: region cleanup · 87e24802
      Paul Jackson authored
      
      
      Paul Mundt <lethal@linux-sh.org> says:
      
      This patch set implements a number of patches to clean up and restructure the
      bitmap region code, in addition to extending the interface to support
      multiword spanning allocations.
      
      The current implementation (before this patch set) is limited by only being
      able to allocate pages <= BITS_PER_LONG, as noted by the strategically
      positioned BUG_ON() at lib/bitmap.c:752:
      
              /* We don't do regions of pages > BITS_PER_LONG.  The
      	 * algorithm would be a simple look for multiple zeros in the
      	 * array, but there's no driver today that needs this.  If you
      	 * trip this BUG(), you get to code it... */
              BUG_ON(pages > BITS_PER_LONG);
      
      As I seem to have been the first person to trigger this, the result ends up
      being the following patch set with the help of Paul Jackson.
      
      The final patch in the series eliminates quite a bit of code duplication, so
      the bitmap code size ends up being smaller than the current implementation as
      an added bonus.
      
      After these are applied, it should already be possible to do multiword
      allocations with dma_alloc_coherent() out of ranges established by
      dma_declare_coherent_memory() on x86 without having to change any of the code,
      and the SH store queue API will follow up on this as the other user that needs
      support for this.
      
      This patch:
      
      Some code cleanup on the lib/bitmap.c bitmap_*_region() routines:
      
       * spacing
       * variable names
       * comments
      
      Has no change to code function.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      87e24802