1. 01 Aug, 2008 1 commit
    • Tony Luck's avatar
      [IA64] Move include/asm-ia64 to arch/ia64/include/asm · 7f30491c
      Tony Luck authored
      After moving the the include files there were a few clean-ups:
      1) Some files used #include <asm-ia64/xyz.h>, changed to <asm/xyz.h>
      2) Some comments alerted maintainers to look at various header files to
      make matching updates if certain code were to be changed. Updated these
      comments to use the new include paths.
      3) Some header files mentioned their own names in initial comments. Just
      deleted these self references.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
  2. 30 Jul, 2008 1 commit
    • Jack Steiner's avatar
      GRU Driver: hardware data structures · 34d8a380
      Jack Steiner authored
      This series of patches adds a driver for the SGI UV GRU.  The driver is
      still in development but it currently compiles for both x86_64 & IA64.
      All simple regression tests pass on IA64.  Although features remain to be
      added, I'd like to start the process of getting the driver into the
      kernel.  Additional kernel drivers will depend on services provide by the
      GRU driver.
      The GRU is a hardware resource located in the system chipset.  The GRU
      contains memory that is mmaped into the user address space.  This memory
      is used to communicate with the GRU to perform functions such as
      load/store, scatter/gather, bcopy, AMOs, etc.  The GRU is directly
      accessed by user instructions using user virtual addresses.  GRU
      instructions (ex., bcopy) use user virtual addresses for operands.
      The GRU contains a large TLB that is functionally very similar to
      processor TLBs.  Because the external contains a TLB with user virtual
      address, it requires callouts from the core VM system when certain types
      of changes are made to the process page tables.  There are several MMUOPS
      patches currently being discussed but none has been accepted into the
      kernel.  The GRU driver is built using version V18 from Andrea Arcangeli.
      This patch:
      Contains the definitions of the hardware GRU data structures that are used
      by the driver to manage the GRU.
      [akpm@linux-foundation;org: export hpage_shift]
      Signed-off-by: default avatarJack Steiner <steiner@sgi.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  3. 27 Jul, 2008 1 commit
    • Julia Lawall's avatar
      KVM: ia64: Fix irq disabling leak in error handling code · cab7a1ee
      Julia Lawall authored
      There is a call to local_irq_restore in the normal exit case, so it would
      seem that there should be one on an error return as well.
      The semantic patch that finds this problem is as follows:
      // <smpl>
      expression l;
      expression E,E1,E2;
      ... when != local_irq_restore(l)
          when != spin_unlock_irqrestore(E,l)
          when any
          when strict
      if (...) { ... when != local_irq_restore(l)
                     when != spin_unlock_irqrestore(E1,l)
      +   local_irq_restore(l);
          return ...;
      if (...)
      +   {local_irq_restore(l);
          return ...;
      +   }
      // </smpl>
      Signed-off-by: default avatarJulia Lawall <julia@diku.dk>
      Signed-off-by: default avatarAvi Kivity <avi@qumranet.com>
  4. 26 Jul, 2008 2 commits
    • Roland McGrath's avatar
      tracehook: wait_task_inactive · 85ba2d86
      Roland McGrath authored
      This extends wait_task_inactive() with a new argument so it can be used in
      a "soft" mode where it will check for the task changing state unexpectedly
      and back off.  There is no change to existing callers.  This lays the
      groundwork to allow robust, noninvasive tracing that can try to sample a
      blocked thread but back off safely if it wakes up.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Reviewed-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • FUJITA Tomonori's avatar
      dma-mapping: add the device argument to dma_mapping_error() · 8d8bb39b
      FUJITA Tomonori authored
      Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
      architecture does:
      This enables us to cleanly fix the Calgary IOMMU issue that some devices
      are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423
      I think that per-device dma_mapping_ops support would be also helpful for
      KVM people to support PCI passthrough but Andi thinks that this makes it
      difficult to support the PCI passthrough (see the above thread).  So I
      CC'ed this to KVM camp.  Comments are appreciated.
      A pointer to dma_mapping_ops to struct dev_archdata is added.  If the
      pointer is non NULL, DMA operations in asm/dma-mapping.h use it.  If it's
      NULL, the system-wide dma_ops pointer is used as before.
      If it's useful for KVM people, I plan to implement a mechanism to register
      a hook called when a new pci (or dma capable) device is created (it works
      with hot plugging).  It enables IOMMUs to set up an appropriate
      dma_mapping_ops per device.
      The major obstacle is that dma_mapping_error doesn't take a pointer to the
      device unlike other DMA operations.  So x86 can't have dma_mapping_ops per
      device.  Note all the POWER IOMMUs use the same dma_mapping_error function
      so this is not a problem for POWER but x86 IOMMUs use different
      dma_mapping_error functions.
      The first patch adds the device argument to dma_mapping_error.  The patch
      is trivial but large since it touches lots of drivers and dma-mapping.h in
      all the architecture.
      This patch:
      dma_mapping_error() doesn't take a pointer to the device unlike other DMA
      operations.  So we can't have dma_mapping_ops per device.
      Note that POWER already has dma_mapping_ops per device but all the POWER
      IOMMUs use the same dma_mapping_error function.  x86 IOMMUs use device
      [akpm@linux-foundation.org: fix sge]
      [akpm@linux-foundation.org: fix svc_rdma]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix bnx2x]
      [akpm@linux-foundation.org: fix s2io]
      [akpm@linux-foundation.org: fix pasemi_mac]
      [akpm@linux-foundation.org: fix sdhci]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix sparc]
      [akpm@linux-foundation.org: fix ibmvscsi]
      Signed-off-by: default avatarFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Avi Kivity <avi@qumranet.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  5. 25 Jul, 2008 2 commits
    • Srinivasa D S's avatar
      kprobes: improve kretprobe scalability with hashed locking · ef53d9c5
      Srinivasa D S authored
      Currently list of kretprobe instances are stored in kretprobe object (as
      used_instances,free_instances) and in kretprobe hash table.  We have one
      global kretprobe lock to serialise the access to these lists.  This causes
      only one kretprobe handler to execute at a time.  Hence affects system
      performance, particularly on SMP systems and when return probe is set on
      lot of functions (like on all systemcalls).
      Solution proposed here gives fine-grain locks that performs better on SMP
      system compared to present kretprobe implementation.
       1) Instead of having one global lock to protect kretprobe instances
          present in kretprobe object and kretprobe hash table.  We will have
          two locks, one lock for protecting kretprobe hash table and another
          lock for kretporbe object.
       2) We hold lock present in kretprobe object while we modify kretprobe
          instance in kretprobe object and we hold per-hash-list lock while
          modifying kretprobe instances present in that hash list.  To prevent
          deadlock, we never grab a per-hash-list lock while holding a kretprobe
       3) We can remove used_instances from struct kretprobe, as we can
          track used instances of kretprobe instances using kretprobe hash
      Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
      with return probes set on all systemcalls looks like this.
      cacheline              non-cacheline             Un-patched kernel
      aligned patch 	       aligned patch
      real    9m46.784s       9m54.412s                  10m2.450s
      user    40m5.715s       40m7.142s                  40m4.273s
      sys     2m57.754s       2m58.583s                  3m17.430s
      Time duration for kernel compilation ("make -j 8) on the same system, when
      kernel is not probed.
      real    9m26.389s
      user    40m8.775s
      sys     2m7.283s
      Signed-off-by: default avatarSrinivasa DS <srinivasa@in.ibm.com>
      Signed-off-by: default avatarJim Keniston <jkenisto@us.ibm.com>
      Acked-by: default avatarAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Tony Luck's avatar
      [IA64] Wire up new system calls · 3e4d0cab
      Tony Luck authored
      Six new system calls: signalfd4, eventfd2, epoll_create1,
      dup3, pipe2 and inotify_init1.
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
  6. 24 Jul, 2008 6 commits
    • Ulrich Drepper's avatar
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper authored
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      main (void)
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
            puts ("pipe2(0) failed");
            return 1;
        for (int i = 0; i < 2; ++i)
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
                puts ("fcntl failed");
                return 1;
            if (coe & FD_CLOEXEC)
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
        close (fd[0]);
        close (fd[1]);
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
        for (int i = 0; i < 2; ++i)
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
                puts ("fcntl failed");
                return 1;
            if ((coe & FD_CLOEXEC) == 0)
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
        close (fd[0]);
        close (fd[1]);
        puts ("OK");
        return 0;
      Signed-off-by: default avatarUlrich Drepper <drepper@redhat.com>
      Acked-by: default avatarDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Johannes Weiner's avatar
      bootmem: replace node_boot_start in struct bootmem_data · 3560e249
      Johannes Weiner authored
      Almost all users of this field need a PFN instead of a physical address,
      so replace node_boot_start with node_min_pfn.
      [Lee.Schermerhorn@hp.com: fix spurious BUG_ON() in mark_bootmem()]
      Signed-off-by: default avatarJohannes Weiner <hannes@saeureba.de>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Andi Kleen's avatar
      hugetlb: introduce pud_huge · ceb86879
      Andi Kleen authored
      Straight forward extensions for huge pages located in the PUD instead of
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Andi Kleen's avatar
      hugetlb: modular state for hugetlb page size · a5516438
      Andi Kleen authored
      The goal of this patchset is to support multiple hugetlb page sizes.  This
      is achieved by introducing a new struct hstate structure, which
      encapsulates the important hugetlb state and constants (eg.  huge page
      size, number of huge pages currently allocated, etc).
      The hstate structure is then passed around the code which requires these
      fields, they will do the right thing regardless of the exact hstate they
      are operating on.
      This patch adds the hstate structure, with a single global instance of it
      (default_hstate), and does the basic work of converting hugetlb to use the
      Future patches will add more hstate structures to allow for different
      hugetlbfs mounts to have different page sizes.
      [akpm@linux-foundation.org: coding-style fixes]
      Acked-by: default avatarAdam Litke <agl@us.ibm.com>
      Acked-by: default avatarNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Jan Beulich's avatar
      mm: remove double indirection on tlb parameter to free_pgd_range() & Co · 42b77728
      Jan Beulich authored
      The double indirection here is not needed anywhere and hence (at least)
      Signed-off-by: default avatarJan Beulich <jbeulich@novell.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: default avatarJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    • Johannes Weiner's avatar
      mm: move bootmem descriptors definition to a single place · b61bfa3c
      Johannes Weiner authored
      There are a lot of places that define either a single bootmem descriptor or an
      array of them.  Use only one central array with MAX_NUMNODES items instead.
      Signed-off-by: default avatarJohannes Weiner <hannes@saeurebad.de>
      Acked-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  7. 22 Jul, 2008 1 commit
    • Andi Kleen's avatar
      sysdev: Pass the attribute to the low level sysdev show/store function · 4a0b2b4d
      Andi Kleen authored
      This allow to dynamically generate attributes and share show/store
      functions between attributes. Right now most attributes are generated
      by special macros and lots of duplicated code. With the attribute
      passed it's instead possible to attach some data to the attribute
      and then use that in shared low level functions to do different things.
      I need this for the dynamically generated bank attributes in the x86
      machine check code, but it'll allow some further cleanups.
      I converted all users in tree to the new show/store prototype. It's a single
      huge patch to avoid unbisectable sections.
      Runtime tested: x86-32, x86-64
      Compiled only: ia64, powerpc
      Not compile tested/only grep converted: sh, arm, avr32
      Signed-off-by: default avatarAndi Kleen <ak@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
  8. 21 Jul, 2008 1 commit
  9. 20 Jul, 2008 4 commits
  10. 17 Jul, 2008 5 commits
  11. 16 Jul, 2008 2 commits
  12. 30 Jun, 2008 2 commits
  13. 26 Jun, 2008 3 commits
  14. 24 Jun, 2008 3 commits
  15. 20 Jun, 2008 1 commit
  16. 16 Jun, 2008 1 commit
  17. 11 Jun, 2008 3 commits
    • Alex Chiang's avatar
      [IA64] Update check_sal_cache_flush to use platform_send_ipi() · 3463a93d
      Alex Chiang authored
      check_sal_cache_flush is used to detect broken firmware that drops
      pending interrupts.
      The old implementation schedules a timer interrupt for itself in
      the future by getting the current value of the Interval Timer
      Counter + 1000 cycles, waits for the interrupt to be pended, calls
      SAL_CACHE_FLUSH, and finally checks to see if the interrupt is
      still pending.
      This implementation can cause problems for virtual machine code if
      the process of scheduling the timer interrupt takes more than 1000
      cycles; the virtual machine can end up sleeping for several hundred
      years while waiting for the ITC to wrap around.
      The fix is to use platform_send_ipi. The processor will still send
      an interrupt to itself, using the IA64_IPI_DM_INT delivery mode,
      which causes the IPI to look like an external interrupt. The rest
      of the SAL_CACHE_FLUSH + checking to see if the interrupt is still
      pending remains unchanged.
      This fix has been boot tested successfully on:
      	- intel tiger2
      	- hp rx6600
      	- hp rx5670
      The rx5670 has known buggy firmware, where SAL_CACHE_FLUSH drops
      pending interrupts. A boot test on this machine showed this message
      on the console:
      SAL: SAL_CACHE_FLUSH drops interrupts; PAL_CACHE_FLUSH will be used instead
      Which proves that the self-inflicted IPI approach is viable. And
      as expected, the other tested platforms correctly did not display
      the warning.
      Signed-off-by: default avatarAlex Chiang <achiang@hp.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
    • Fenghua Yu's avatar
      ACPI: handle invalid ACPI SLIT table · 39b8931b
      Fenghua Yu authored
      This is a SLIT sanity checking patch.  It moves slit_valid() function to
      generic ACPI code and does sanity checking for both x86 and ia64.  It sets up
      node_distance with LOCAL_DISTANCE and REMOTE_DISTANCE when hitting invalid
      SLIT table on ia64.  It also cleans up unused variable localities in
      acpi_parse_slit() on x86.
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
    • stephane eranian's avatar
      [IA64] perfmon: fix async exit bug · 83014699
      stephane eranian authored
      Move the cleanup of the async queue to the close callback from the flush
      callback. This avoids losing asynchronous overflow notifications when
      the file descriptor is shared by multiple processes and one terminates.
      Signed-off-by: default avatarStephane Eranian <eranian@gmail.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
  18. 06 Jun, 2008 1 commit