1. 30 Jan, 2008 1 commit
    • Roland McGrath's avatar
      x86: protect against sigaltstack wraparound · 83bd0102
      Roland McGrath authored
      cf http://lkml.org/lkml/2007/10/3/41
      To summarize: on Linux, SA_ONSTACK decides whether you are already on the
      signal stack based on the value of the SP at the time of a signal.  If
      you are not already inside the range, you are not "on the signal stack"
      and so the new signal handler frame starts over at the base of the signal
      sigaltstack (and sigstack before it) was invented in BSD.  There, the
      SA_ONSTACK behavior has always been different.  It uses a kernel state
      flag to decide, rather than the SP value.  When you first take an
      SA_ONSTACK signal and switch to the alternate signal stack, it sets the
      SS_ONSTACK flag in the thread's sigaltstack state in the kernel.
      Thereafter you are "on the signal stack" and don't switch SP before
      pushing a handler frame no matter what the SP value is.  Only when you
      sigreturn from the original handler context do you clear the SS_ONSTACK
      flag so that a new handler frame will start over at the base of the
      alternate signal stack.
      The undesireable effect of the Linux behavior is that an overflow of the
      alternate signal stack can not only go undetected, but lead to a ring
      buffer effect of clobbering the original handler frame at the base of the
      signal stack for each successive signal that comes just after the
      overflow.  This is what Shi Weihua's test case demonstrates.  Normally
      this does not come up because of the signal mask, but the test case uses
      SA_NODEFER for its SIGSEGV handler.
      The other subtle part of the existing Linux semantics is that a simple
      longjmp out of a signal handler serves to take you off the signal stack
      in a safe and reliable fashion without having used sigreturn (nor having
      just returned from the handler normally, which means the same).  After
      the longjmp (or even informal stack switching not via any proper libc or
      kernel interface), the alternate signal stack stands ready to be used
      A paranoid program would allocate a PROT_NONE red zone around its
      alternate signal stack.  Then a small overflow would trigger a SIGSEGV in
      handler setup, and be fatal (core dump) whether or not SIGSEGV is
      blocked.  As with thread stack red zones, that cannot catch all overflows
      (or underflows).  e.g., a local array as large as page size allocated in
      a function called from a handler, but not actually touched before more
      calls push more stack, could cause an overflow that silently pushes into
      some unrelated allocated pages.
      The BSD behavior does not do anything in particular about overflow.  But
      it does at least avoid the wraparound or "ring buffer effect", so you'll
      just get a straightforward all-out overflow down your address space past
      the low end of the alternate signal stack.  I don't know what the BSD
      behavior is for longjmp out of an SA_ONSTACK handler.
      The POSIX wording relating to sigaltstack is pretty minimal.  I don't
      think it speaks to this issue one way or another.  (The program that
      overflows its stack is clearly in undefined behavior territory of one
      sort or another anyhow.)
      Given the longjmp issue and the potential for highly subtle complications
      in existing programs relying on this in arcane ways deep in their code, I
      am very dubious about changing the behavior to the BSD style persistent
      flag.  I think Shi Weihua's patches have a similar effect by tracking the
      SP used in the last handler setup.
      I think it would be sensible for the signal handler setup code to detect
      when it would itself be causing a stack overflow.  Maybe something like
      the following patch (untested).  This issue exists in the same way on all
      machines, so ideally they would all do a similar check.
      When it's the handler function itself or its callees that cause the
      overflow, rather than the signal handler frame setup alone crossing the
      boundary, this still won't help.  But I don't see any way to distinguish
      that from the valid longjmp case.
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
  2. 25 Jan, 2008 1 commit
    • Peter Zijlstra's avatar
      sched: high-res preemption tick · 8f4d37ec
      Peter Zijlstra authored
      Use HR-timers (when available) to deliver an accurate preemption tick.
      The regular scheduler tick that runs at 1/HZ can be too coarse when nice
      level are used. The fairness system will still keep the cpu utilisation 'fair'
      by then delaying the task that got an excessive amount of CPU time but try to
      minimize this by delivering preemption points spot-on.
      The average frequency of this extra interrupt is sched_latency / nr_latency.
      Which need not be higher than 1/HZ, its just that the distribution within the
      sched_latency period is important.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
  3. 19 Oct, 2007 2 commits
  4. 17 Oct, 2007 1 commit
  5. 13 Oct, 2007 1 commit
    • Dave Jones's avatar
      Delete filenames in comments. · 835c34a1
      Dave Jones authored
      Since the x86 merge, lots of files that referenced their own filenames
      are no longer correct.  Rather than keep them up to date, just delete
      them, as they add no real value.
      - fix up comment formatting in scx200_32.c
      - Remove a credit from myself in setup_64.c from a time when we had no SCM
      - remove longwinded history from tsc_32.c which can be figured out from
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  6. 11 Oct, 2007 3 commits
  7. 22 Jul, 2007 1 commit
    • Masoud Asgharifard Sharbiani's avatar
      x86: i386-show-unhandled-signals-v3 · abd4f750
      Masoud Asgharifard Sharbiani authored
      This patch makes the i386 behave the same way that x86_64 does when a
      segfault happens.  A line gets printed to the kernel log so that tools
      that need to check for failures can behave more uniformly between
      debug.show_unhandled_signals sysctl variable to 0 (or by doing echo 0 >
      Also, all of the lines being printed are now using printk_ratelimit() to
      deny the ability of DoS from a local user with a program like the
             while (1)
                     if (!fork()) *(int *)0 = 0;
      This new revision also includes the fix that Andrew did which got rid of
      new sysctl that was added to the system in earlier versions of this.
      Also, 'show-unhandled-signals' sysctl has been renamed back to the old
      'exception-trace' to avoid breakage of people's scripts.
      AK: Enabling by default for i386 will be likely controversal, but let's see what happens
      AK: Really folks, before complaining just fix your segfaults
      AK: I bet this will find a lot of silent issues
      Signed-off-by: default avatarMasoud Sharbiani <masouds@google.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      [ Personally, I've found the complaints useful on x86-64, so I'm all for
        this. That said, I wonder if we could do it more prettily..   -Linus ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
  8. 08 May, 2007 1 commit
  9. 13 Feb, 2007 2 commits
  10. 07 Dec, 2006 1 commit
    • Jeremy Fitzhardinge's avatar
      [PATCH] i386: Use %gs as the PDA base-segment in the kernel · f95d47ca
      Jeremy Fitzhardinge authored
      This patch is the meat of the PDA change.  This patch makes several related
      1: Most significantly, %gs is now used in the kernel.  This means that on
         entry, the old value of %gs is saved away, and it is reloaded with
      2: entry.S constructs the stack in the shape of struct pt_regs, and this
         is passed around the kernel so that the process's saved register
         state can be accessed.
         Unfortunately struct pt_regs doesn't currently have space for %gs
         (or %fs). This patch extends pt_regs to add space for gs (no space
         is allocated for %fs, since it won't be used, and it would just
         complicate the code in entry.S to work around the space).
      3: Because %gs is now saved on the stack like %ds, %es and the integer
         registers, there are a number of places where it no longer needs to
         be handled specially; namely context switch, and saving/restoring the
         register state in a signal context.
      4: And since kernel threads run in kernel space and call normal kernel
         code, they need to be created with their %gs == __KERNEL_PDA.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
  11. 28 Jun, 2006 1 commit
    • Ingo Molnar's avatar
      [PATCH] vdso: randomize the i386 vDSO by moving it into a vma · e6e5494c
      Ingo Molnar authored
      Move the i386 VDSO down into a vma and thus randomize it.
      Besides the security implications, this feature also helps debuggers, which
      can COW a vma-backed VDSO just like a normal DSO and can thus do
      single-stepping and other debugging features.
      It's good for hypervisors (Xen, VMWare) too, which typically live in the same
      high-mapped address space as the VDSO, hence whenever the VDSO is used, they
      get lots of guest pagefaults and have to fix such guest accesses up - which
      slows things down instead of speeding things up (the primary purpose of the
      There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
      for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
      distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
      it off is also recommended for security reasons: attackers cannot use the
      predictable high-mapped VDSO page as syscall trampoline anymore.
      There is a new vdso=[0|1] boot option as well, and a runtime
      /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
      (This version of the VDSO-randomization patch also has working ELF
      coredumping, the previous patch crashed in the coredumping code.)
      This code is a combined work of the exec-shield VDSO randomization
      code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
      started this patch and i completed it.
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: compile fix]
      [akpm@osdl.org: compile fix 2]
      [akpm@osdl.org: compile fix 3]
      [akpm@osdl.org: revernt MAXMEM change]
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarArjan van de Ven <arjan@infradead.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  12. 23 Mar, 2006 2 commits
  13. 19 Jan, 2006 1 commit
    • David Howells's avatar
      [PATCH] Handle TIF_RESTORE_SIGMASK for i386 · 283828f3
      David Howells authored
      Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled:
              [PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc
              [PATCH] 3/3 Generic sys_rt_sigsuspend
      It does the following:
       (1) Declares TIF_RESTORE_SIGMASK for i386.
       (2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set.
       (3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved
           in current->saved_sigmask.
       (4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead.
       (5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK
           rather than attempting to fudge the return registers.
       (6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping
       (7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or
           -EFAULT rather than true/false to be consistent with the rest of the
      Due to the fact do_signal() is then only called from one place:
       (8) Makes do_signal() no longer have a return value is it was just being
           ignored; force_sig() takes care of this.
       (9) Discards the old sigmask argument to do_signal() as it's no longer
      (10) Makes do_signal() static.
      (11) Marks the second argument to do_notify_resume() as unused. The unused
           argument should remain in the middle as the arguments are passed in as
           registers, and the ordering is specific in entry.S
      Given the way do_signal() is now no longer called from sys_{,rt_}sigsuspend(),
      they no longer need access to the exception frame, and so can just take
      arguments normally.
      This patch depends on sys_rt_sigsuspend patch.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  14. 10 Oct, 2005 1 commit
  15. 05 Sep, 2005 2 commits
    • Zachary Amsden's avatar
      [PATCH] x86: privilege cleanup · 0998e422
      Zachary Amsden authored
      Privilege checking cleanup.  Originally, these diffs were much greater, but
      recent cleanups in Linux have already done much of the cleanup.  I added
      some explanatory comments in places where the reasoning behind certain
      tests is rather subtle.
      Also, in traps.c, we can skip the user_mode check in handle_BUG().  The
      reason is, there are only two call chains - one via die_if_kernel() and one
      via do_page_fault(), both entering from die().  Both of these paths already
      ensure that a kernel mode failure has happened.  Also, the original check
      here, if (user_mode(regs)) was insufficient anyways, since it would not
      rule out BUG faults from V8086 mode execution.
      Saving the %ss segment in show_regs() rather than assuming a fixed value
      also gives better information about the current kernel state in the
      register dump.
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Zachary Amsden's avatar
      [PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management · 4d37e7e3
      Zachary Amsden authored
      i386 inline assembler cleanup.
      This change encapsulates descriptor and task register management.  Also,
      it is possible to improve assembler generation in two cases; savesegment
      may store the value in a register instead of a memory location, which
      allows GCC to optimize stack variables into registers, and MOV MEM, SEG
      is always a 16-bit write to memory, making the casting in math-emu
      Signed-off-by: default avatarZachary Amsden <zach@vmware.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  16. 29 Aug, 2005 1 commit
    • Steven Rostedt's avatar
      [PATCH] convert signal handling of NODEFER to act like other Unix boxes. · 69be8f18
      Steven Rostedt authored
      It has been reported that the way Linux handles NODEFER for signals is
      not consistent with the way other Unix boxes handle it.  I've written a
      program to test the behavior of how this flag affects signals and had
      several reports from people who ran this on various Unix boxes,
      confirming that Linux seems to be unique on the way this is handled.
      The way NODEFER affects signals on other Unix boxes is as follows:
      1) If NODEFER is set, other signals in sa_mask are still blocked.
      2) If NODEFER is set and the signal is in sa_mask, then the signal is
      still blocked. (Note: this is the behavior of all tested but Linux _and_
      NetBSD 2.0 *).
      The way NODEFER affects signals on Linux:
      1) If NODEFER is set, other signals are _not_ blocked regardless of
      sa_mask (Even NetBSD doesn't do this).
      2) If NODEFER is set and the signal is in sa_mask, then the signal being
      handled is not blocked.
      The patch converts signal handling in all current Linux architectures to
      the way most Unix boxes work.
      Unix boxes that were tested:  DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
      3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
      * NetBSD was the only other Unix to behave like Linux on point #2. The
      main concern was brought up by point #1 which even NetBSD isn't like
      Linux.  So with this patch, we leave NetBSD as the lonely one that
      behaves differently here with #2.
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  17. 26 Jun, 2005 2 commits
    • Linus Torvalds's avatar
      Fix up try_to_freeze() usage in arch/i386/kernel/signal.c · 6f0dcb72
      Linus Torvalds authored
      The parentheses were missing. Noted by Pavel Machek.
    • Christoph Lameter's avatar
      [PATCH] Cleanup patch for process freezing · 3e1d1d28
      Christoph Lameter authored
      1. Establish a simple API for process freezing defined in linux/include/sched.h:
         frozen(process)		Check for frozen process
         freezing(process)		Check if a process is being frozen
         freeze(process)		Tell a process to freeze (go to refrigerator)
         thaw_process(process)	Restart process
         frozen_process(process)	Process is frozen now
      2. Remove all references to PF_FREEZE and PF_FROZEN from all
         kernel sources except sched.h
      3. Fix numerous locations where try_to_freeze is manually done by a driver
      4. Remove the argument that is no longer necessary from two function calls.
      5. Some whitespace cleanup
      6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
         cleared before setting PF_FROZEN, recalc_sigpending does not check
      This patch does not address the problem of freeze_processes() violating the rule
      that a task may only modify its own flags by setting PF_FREEZE. This is not clean
      in an SMP environment. freeze(process) is therefore not SMP safe!
      Signed-off-by: default avatarChristoph Lameter <christoph@lameter.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
  18. 23 Jun, 2005 3 commits
  19. 16 Apr, 2005 2 commits
    • Roland McGrath's avatar
      [PATCH] i386: Use loaddebug macro consistently · ecd02ddd
      Roland McGrath authored
      This moves the macro loaddebug from asm-i386/suspend.h to
      asm-i386/processor.h, which is the place that makes sense for it to be
      defined, removes the extra copy of the same macro in
      arch/i386/kernel/process.c, and makes arch/i386/kernel/signal.c use the
      macro in place of its expansion.
      This is a purely cosmetic cleanup for the normal i386 kernel.  However, it
      is handy for Xen to be able to just redefine the loaddebug macro once
      instead of also changing the signal.c code.
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      Let it rip!