1. 07 Dec, 2009 3 commits
    • Martin Schwidefsky's avatar
      [S390] Improve notify_page_fault implementation. · 7ecb344a
      Martin Schwidefsky authored
      
      
      notify_page_fault does a preempt_disable/preempt_enable for each
      fault generated by a kernel access to user space. If kprobes
      is not active that is unnecessary since the interrupts are not
      reenabled yet. To play safe repeat the kprobe_running check after
      preempt_disable().
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      7ecb344a
    • Martin Schwidefsky's avatar
      [S390] Improve address space mode selection. · b11b5334
      Martin Schwidefsky authored
      
      
      Introduce user_mode to replace the two variables switch_amode and
      s390_noexec. There are three valid combinations of the old values:
        1) switch_amode == 0 && s390_noexec == 0
        2) switch_amode == 1 && s390_noexec == 0
        3) switch_amode == 1 && s390_noexec == 1
      They get replaced by
        1) user_mode == HOME_SPACE_MODE
        2) user_mode == PRIMARY_SPACE_MODE
        3) user_mode == SECONDARY_SPACE_MODE
      The new kernel parameter user_mode=[primary,secondary,home] lets
      you choose the address space mode the user space processes should
      use. In addition the CONFIG_S390_SWITCH_AMODE config option
      is removed.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      b11b5334
    • Martin Schwidefsky's avatar
      [S390] Improve address space check. · 61365e13
      Martin Schwidefsky authored
      
      
      A data access in access-register mode always is a user mode access,
      the code to inspect the access-registers can be removed. The second
      change is to use a different test to check for no-execute fault.
      The third change is to pass the translation exception identification
      as parameter, in theory the trans_exc_code in the lowcore could have
      been overwritten by the time the call to check_space from do_no_context
      is done.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      61365e13
  2. 21 Sep, 2009 1 commit
    • Ingo Molnar's avatar
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar authored
      
      
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: default avatarStephane Eranian <eranian@google.com>
      Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Reviewed-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      cdd6c482
  3. 11 Sep, 2009 1 commit
  4. 12 Jul, 2009 1 commit
  5. 21 Jun, 2009 1 commit
  6. 12 Jun, 2009 1 commit
  7. 26 Mar, 2009 1 commit
  8. 30 Apr, 2008 1 commit
  9. 17 Apr, 2008 2 commits
  10. 09 Feb, 2008 1 commit
    • Martin Schwidefsky's avatar
      [S390] dynamic page tables. · 6252d702
      Martin Schwidefsky authored
      
      
      Add support for different number of page table levels dependent
      on the highest address used for a process. This will cause a 31 bit
      process to use a two level page table instead of the four level page
      table that is the default after the pud has been introduced. Likewise
      a normal 64 bit process will use three levels instead of four. Only
      if a process runs out of the 4 tera bytes which can be addressed with
      a three level page table the fourth level is dynamically added. Then
      the process can use up to 8 peta byte.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      6252d702
  11. 19 Oct, 2007 1 commit
    • Serge E. Hallyn's avatar
      pid namespaces: define is_global_init() and is_container_init() · b460cbc5
      Serge E. Hallyn authored
      
      
      is_init() is an ambiguous name for the pid==1 check.  Split it into
      is_global_init() and is_container_init().
      
      A cgroup init has it's tsk->pid == 1.
      
      A global init also has it's tsk->pid == 1 and it's active pid namespace
      is the init_pid_ns.  But rather than check the active pid namespace,
      compare the task structure with 'init_pid_ns.child_reaper', which is
      initialized during boot to the /sbin/init process and never changes.
      
      Changelog:
      
      	2.6.22-rc4-mm2-pidns1:
      	- Use 'init_pid_ns.child_reaper' to determine if a given task is the
      	  global init (/sbin/init) process. This would improve performance
      	  and remove dependence on the task_pid().
      
      	2.6.21-mm2-pidns2:
      
      	- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
      	  ppc,avr32}/traps.c for the _exception() call to is_global_init().
      	  This way, we kill only the cgroup if the cgroup's init has a
      	  bug rather than force a kernel panic.
      
      [akpm@linux-foundation.org: fix comment]
      [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
      [bunk@stusta.de: kernel/pid.c: remove unused exports]
      [sukadev@us.ibm.com: Fix capability.c to work with threaded init]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Acked-by: default avatarPavel Emelianov <xemul@openvz.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Herbert Poetzel <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b460cbc5
  12. 16 Oct, 2007 1 commit
    • Will Schmidt's avatar
      During VM oom condition, kill all threads in process group · dcca2bde
      Will Schmidt authored
      
      
      We have had complaints where a threaded application is left in a bad state
      after one of it's threads is killed when we hit a VM: out_of_memory
      condition.
      
      Killing just one of the process threads can leave the application in a bad
      state, whereas killing the entire process group would allow for the
      application to restart, or be otherwise handled, and makes it very obvious
      that something has gone wrong.
      
      This change allows the entire process group to be taken down, rather
      than just the one thread.
      Signed-off-by: default avatarWill Schmidt <will_schmidt@vnet.ibm.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dcca2bde
  13. 12 Oct, 2007 1 commit
  14. 19 Jul, 2007 1 commit
    • Nick Piggin's avatar
      mm: fault feedback #2 · 83c54070
      Nick Piggin authored
      
      
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: default avatarNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: default avatarKyle McMartin <kyle@mcmartin.ca>
      Acked-by: default avatarHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: default avatarRalf Baechle <ralf@linux-mips.org>
      Acked-by: default avatarAndi Kleen <ak@muc.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
  15. 10 May, 2007 1 commit
  16. 08 May, 2007 1 commit
    • Christoph Hellwig's avatar
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig authored
      
      
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarBryan Wu <bryan.wu@analog.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
  17. 04 May, 2007 1 commit
  18. 27 Apr, 2007 2 commits
  19. 05 Mar, 2007 1 commit
    • Gerald Schaefer's avatar
      [S390] Fixed handling of access register mode faults. · 482b05dd
      Gerald Schaefer authored
      
      
      Replaced check_user_space() + __check_access_register with the new
      check_space(). The old functions made wrong assumptions about kernel
      and user space when the kernel and user address spaces are switched
      (kernel in home space, user in primary/secondary space).
      Secondly the user process can switch to the accress register mode if
      it is running in primary or secondary mode. In addition it can load
      an arbitrary value to the access registers. If any other value than
      0 for primary space or 1 for secondary space is loaded and memory
      is accessed using the base register related to the access register,
      the program should be terminated with a SIGSEGV. To achieve that the
      DUALD pointer in the DUCT and the PSALD pointer in the PASTE need
      to point to an array of 8 invalid access-list entries to get a
      ALEN-translation exception if an invalid alet is used.
      Signed-off-by: default avatarGerald Schaefer <geraldsc@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      482b05dd
  20. 11 Feb, 2007 1 commit
  21. 05 Feb, 2007 2 commits
    • Gerald Schaefer's avatar
      [S390] noexec protection · c1821c2e
      Gerald Schaefer authored
      
      
      This provides a noexec protection on s390 hardware. Our hardware does
      not have any bits left in the pte for a hw noexec bit, so this is a
      different approach using shadow page tables and a special addressing
      mode that allows separate address spaces for code and data.
      
      As a special feature of our "secondary-space" addressing mode, separate
      page tables can be specified for the translation of data addresses
      (storage operands) and instruction addresses. The shadow page table is
      used for the instruction addresses and the standard page table for the
      data addresses.
      The shadow page table is linked to the standard page table by a pointer
      in page->lru.next of the struct page corresponding to the page that
      contains the standard page table (since page->private is not really
      private with the pte_lock and the page table pages are not in the LRU
      list).
      Depending on the software bits of a pte, it is either inserted into
      both page tables or just into the standard (data) page table. Pages of
      a vma that does not have the VM_EXEC bit set get mapped only in the
      data address space. Any try to execute code on such a page will cause a
      page translation exception. The standard reaction to this is a SIGSEGV
      with two exceptions: the two system call opcodes 0x0a77 (sys_sigreturn)
      and 0x0aad (sys_rt_sigreturn) are allowed. They are stored by the
      kernel to the signal stack frame. Unfortunately, the signal return
      mechanism cannot be modified to use an SA_RESTORER because the
      exception unwinding code depends on the system call opcode stored
      behind the signal stack frame.
      
      This feature requires that user space is executed in secondary-space
      mode and the kernel in home-space mode, which means that the addressing
      modes need to be switched and that the noexec protection only works
      for user space.
      After switching the addressing modes, we cannot use the mvcp/mvcs
      instructions anymore to copy between kernel and user space. A new
      mvcos instruction has been added to the z9 EC/BC hardware which allows
      to copy between arbitrary address spaces, but on older hardware the
      page tables need to be walked manually.
      Signed-off-by: default avatarGerald Schaefer <geraldsc@de.ibm.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      c1821c2e
    • Heiko Carstens's avatar
  22. 04 Dec, 2006 1 commit
  23. 06 Oct, 2006 1 commit
  24. 29 Sep, 2006 1 commit
    • Sukadev Bhattiprolu's avatar
      [PATCH] pidspace: is_init() · f400e198
      Sukadev Bhattiprolu authored
      This is an updated version of Eric Biederman's is_init() patch.
      (http://lkml.org/lkml/2006/2/6/280
      
      ).  It applies cleanly to 2.6.18-rc3 and
      replaces a few more instances of ->pid == 1 with is_init().
      
      Further, is_init() checks pid and thus removes dependency on Eric's other
      patches for now.
      
      Eric's original description:
      
      	There are a lot of places in the kernel where we test for init
      	because we give it special properties.  Most  significantly init
      	must not die.  This results in code all over the kernel test
      	->pid == 1.
      
      	Introduce is_init to capture this case.
      
      	With multiple pid spaces for all of the cases affected we are
      	looking for only the first process on the system, not some other
      	process that has pid == 1.
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarSukadev Bhattiprolu <sukadev@us.ibm.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: <lxc-devel@lists.sourceforge.net>
      Acked-by: default avatarPaul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      f400e198
  25. 28 Sep, 2006 2 commits
  26. 20 Sep, 2006 1 commit
  27. 12 Jul, 2006 1 commit
  28. 30 Jun, 2006 1 commit
  29. 06 Jan, 2006 1 commit
  30. 09 Nov, 2005 1 commit
  31. 07 Nov, 2005 1 commit
    • Martin Schwidefsky's avatar
      [PATCH] s390: remove pagex support · d4b68996
      Martin Schwidefsky authored
      
      
      Remove pagex pseudo page fault code.  It does not work together with the
      system call speedup that makes the complete system call path enabled for
      interrupts.  To make pagex and the syscall speedup code work together we would
      have to add code to the program check handler to do a critical section cleanup
      like the asynchronous interrupt code.  This would make program checks slower.
      Not what we want.
      
      Newer versions of z/VM have the improved pfault pseudo page fault interface.
      This replaces the old pagex interface and does not have the problem.  So its
      better to just rip out the pagex code.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d4b68996
  32. 05 Sep, 2005 1 commit
    • Martin Schwidefsky's avatar
      [PATCH] s390: pfault interrupt race · b6d09449
      Martin Schwidefsky authored
      
      
      There is a race in pfault_interrupt.  That function gets called two times for
      each pfault notification.  Once with a subcode of 0 to indicate that a real
      page is not available and once with a subcode of 0x80 to indicate that the
      page is present again.
      
      Since the two external interrupts can be delivered on two different cpus the
      order in which the two calls are made is unpredictable.  It is possible that
      the subcode 0x80 interrupt is completed before the subcode 0x00 interrupt has
      done the wake_up() call.
      
      To avoid calling wake_up() on an already removed task structure proper task
      structure reference counting is needed.  Increase the reference counter in the
      subcode 0x00 interrupt before setting pfault_wait to zero and return the
      reference after the wake_up call.
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      b6d09449
  33. 05 Jun, 2005 1 commit
  34. 16 Apr, 2005 1 commit
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4