1. 07 Dec, 2006 4 commits
  2. 04 Dec, 2006 1 commit
    • Dwayne Grant McConnell's avatar
      [POWERPC] coredump: Add SPU elf notes to coredump. · bf1ab978
      Dwayne Grant McConnell authored
      
      
      This patch adds SPU elf notes to the coredump. It creates a separate note
      for each of /regs, /fpcr, /lslr, /decr, /decr_status, /mem, /signal1,
      /signal1_type, /signal2, /signal2_type, /event_mask, /event_status,
      /mbox_info, /ibox_info, /wbox_info, /dma_info, /proxydma_info, /object-id.
      
      A new macro, ARCH_HAVE_EXTRA_NOTES, was created for architectures to
      specify they have extra elf core notes.
      
      A new macro, ELF_CORE_EXTRA_NOTES_SIZE, was created so the size of the
      additional notes could be calculated and added to the notes phdr entry.
      
      A new macro, ELF_CORE_WRITE_EXTRA_NOTES, was created so the new notes
      would be written after the existing notes.
      
      The SPU coredump code resides in spufs. Stub functions are provided in the
      kernel which are hooked into the spufs code which does the actual work via
      register_arch_coredump_calls().
      
      A new set of __spufs_<file>_read/get() functions was provided to allow the
      coredump code to read from the spufs files without having to lock the
      SPU context for each file read from.
      
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: default avatarDwayne Grant McConnell <decimal@us.ibm.com>
      Signed-off-by: default avatarArnd Bergmann <arnd.bergmann@de.ibm.com>
      bf1ab978
  3. 15 Oct, 2006 1 commit
  4. 13 Oct, 2006 1 commit
    • Petr Vandrovec's avatar
      [PATCH] Get core dump code to work... · 7f14daa1
      Petr Vandrovec authored
      
      
      The file based core dump code was broken by pipe changes - a relative
      llseek returns the absolute file position on success, not the relative
      one, so dump_seek() always failed when invoked with non-zero current
      position.
      
      Only success/failure can be tested with relative lseek, we have to trust
      kernel that on success we've got right file offset.  With this fix in
      place I have finally real core files instead of 1KB fragments...
      Signed-off-by: default avatarPetr Vandrovec <petr@vandrovec.name>
      [ Cleaned it up a bit while here - use SEEK_CUR instead of hardcoding 1 ]
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      7f14daa1
  5. 01 Oct, 2006 1 commit
    • Andi Kleen's avatar
      [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern · d025c9db
      Andi Kleen authored
      
      
      Using the infrastructure created in previous patches implement support to
      pipe core dumps into programs.
      
      This is done by overloading the existing core_pattern sysctl
      with a new syntax:
      
      |program
      
      When the first character of the pattern is a '|' the kernel will instead
      threat the rest of the pattern as a command to run.  The core dump will be
      written to the standard input of that program instead of to a file.
      
      This is useful for having automatic core dump analysis without filling up
      disks.  The program can do some simple analysis and save only a summary of
      the core dump.
      
      The core dump proces will run with the privileges and in the name space of
      the process that caused the core dump.
      
      I also increased the core pattern size to 128 bytes so that longer command
      lines fit.
      
      Most of the changes comes from allowing core dumps without seeks.  They are
      fairly straight forward though.
      
      One small incompatibility is that if someone had a core pattern previously
      that started with '|' they will get suddenly new behaviour.  I think that's
      unlikely to be a real problem though.
      
      Additional background:
      
      > Very nice, do you happen to have a program that can accept this kind of
      > input for crash dumps?  I'm guessing that the embedded people will
      > really want this functionality.
      
      I had a cheesy demo/prototype.  Basically it wrote the dump to a file again,
      ran gdb on it to get a backtrace and wrote the summary to a shared directory.
      Then there was a simple CGI script to generate a "top 10" crashes HTML
      listing.
      
      Unfortunately this still had the disadvantage to needing full disk space for a
      dump except for deleting it afterwards (in fact it was worse because over the
      pipe holes didn't work so if you have a holey address map it would require
      more space).
      
      Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as
      cores (at least it worked with zsh's =(cat core) syntax), so it would be
      likely possible to do it without temporary space with a simple wrapper that
      calls it in the right way.  I ran out of time before doing that though.
      
      The demo prototype scripts weren't very good.  If there is really interest I
      can dig them out (they are currently on a laptop disk on the desk with the
      laptop itself being in service), but I would recommend to rewrite them for any
      serious application of this and fix the disk space problem.
      
      Also to be really useful it should probably find a way to automatically fetch
      the debuginfos (I cheated and just installed them in advance).  If nobody else
      does it I can probably do the rewrite myself again at some point.
      
      My hope at some point was that desktops would support it in their builtin
      crash reporters, but at least the KDE people I talked too seemed to be happy
      with their user space only solution.
      
      Alan sayeth:
      
        I don't believe that piping as such as neccessarily the right model, but
        the ability to intercept and processes core dumps from user space is asked
        for by many enterprise users as well.  They want to know about, capture,
        analyse and process core dumps, often centrally and in automated form.
      
      [akpm@osdl.org: loff_t != unsigned long]
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      d025c9db
  6. 30 Sep, 2006 1 commit
  7. 29 Sep, 2006 2 commits
  8. 26 Sep, 2006 2 commits
  9. 10 Jul, 2006 1 commit
  10. 03 Jul, 2006 1 commit
    • Chuck Ebbert's avatar
      [PATCH] binfmt_elf: fix checks for bad address · ce51059b
      Chuck Ebbert authored
      
      
      Fix check for bad address; use macro instead of open-coding two checks.
      
      Taken from RHEL4 kernel update.
      
      From: Ernie Petrides <petrides@redhat.com>
      
        For background, the BAD_ADDR() macro should return TRUE if the address is
        TASK_SIZE, because that's the lowest address that is *not* valid for
        user-space mappings.  The macro was correct in binfmt_aout.c but was wrong
        for the "equal to" case in binfmt_elf.c.  There were two in-line validations
        of user-space addresses in binfmt_elf.c, which have been appropriately
        converted to use the corrected BAD_ADDR() macro in the patch you posted
        yesterday.  Note that the size checks against TASK_SIZE are okay as coded.
      
        The additional changes that I propose are below.  These are in the error
        paths for bad ELF entry addresses once load_elf_binary() has already
        committed to exec'ing the new image (following the tearing down of the
        task's original address space).
      
        The 1st hunk deals with the interp-side of the outer "if".  There were two
        problems here.  The printk() should be removed because this path can be
        triggered at will by a bogus interpreter image created and used by a
        malicious user.  Further, the error code should not be ENOEXEC, because that
        causes the loop in search_binary_handler() to continue trying other exec
        handlers (twice, in fact).  But it's too late for this to work correctly,
        because the user address space has already been torn down, and an exec()
        failure cannot be returned to the user code because the code no longer
        exists.  The only recovery is to force a SIGSEGV, but it's best to terminate
        the search loop immediately.  I somewhat arbitrarily chose EINVAL as a
        fallback error code, but any error returned by load_elf_interp() will
        override that (but this value will never be seen by user-space).
      
        The 2nd hunk deals with the non-interp-side of the outer "if".  There were
        two problems here as well.  The SIGSEGV needs to be forced, because a prior
        sigaction() syscall might have set the associated disposition to SIG_IGN.
        And the ENOEXEC should be changed to EINVAL as described above.
      Signed-off-by: default avatarChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: default avatarErnie Petrides <petrides@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      ce51059b
  11. 23 Jun, 2006 2 commits
  12. 22 Jun, 2006 1 commit
    • Miklos Szeredi's avatar
      [PATCH] remove steal_locks() · c89681ed
      Miklos Szeredi authored
      
      
      This patch removes the steal_locks() function.
      
      steal_locks() doesn't work correctly with any filesystem that does it's own
      lock management, including NFS, CIFS, etc.
      
      In addition it has weird semantics on local filesystems in case tasks
      sharing file-descriptor tables are doing POSIX locking operations in
      parallel to execve().
      
      The steal_locks() function has an effect on applications doing:
      
      clone(CLONE_FILES)
        /* in child */
        lock
        execve
        lock
      
      POSIX locks acquired before execve (by "child", "parent" or any further
      task sharing files_struct) will after the execve be owned exclusively by
      "child".
      
      According to Chris Wright some LSB/LTP kind of suite triggers without the
      stealing behavior, but there's no known real-world application that would
      also fail.
      
      Apps using NPTL are not affected, since all other threads are killed before
      execve.
      
      Apps using LinuxThreads are only affected if they
      
        - have multiple threads during exec (LinuxThreads doesn't kill other
          threads, the app may do it with pthread_kill_other_threads_np())
        - rely on POSIX locks being inherited across exec
      
      Both conditions are documented, but not their interaction.
      
      Apps using clone() natively are affected if they
      
        - use clone(CLONE_FILES)
        - rely on POSIX locks being inherited across exec
      
      The above scenarios are unlikely, but possible.
      
      If the patch is vetoed, there's a plan B, that involves mostly keeping the
      weird stealing semantics, but changing the way lock ownership is handled so
      that network and local filesystems work consistently.
      
      That would add more complexity though, so this solution seems to be
      preferred by most people.
      Signed-off-by: default avatarMiklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Steven French <sfrench@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      c89681ed
  13. 25 Mar, 2006 3 commits
  14. 26 Feb, 2006 1 commit
  15. 15 Jan, 2006 1 commit
  16. 11 Jan, 2006 1 commit
  17. 10 Jan, 2006 1 commit
  18. 09 Jan, 2006 2 commits
  19. 07 Nov, 2005 1 commit
  20. 31 Oct, 2005 1 commit
    • Eric W. Biederman's avatar
      [PATCH] Don't uselessly export task_struct to userspace in core dumps · a9289728
      Eric W. Biederman authored
      
      
      task_struct is an internal structure to the kernel with a lot of good
      information, that is probably interesting in core dumps.  However there is
      no way for user space to know what format that information is in making it
      useless.
      
      I grepped the GDB 6.3 source code and NT_TASKSTRUCT while defined is not
      used anywhere else.  So I would be surprised if anyone notices it is
      missing.
      
      In addition exporting kernel pointers to all the interesting kernel data
      structures sounds like the very definition of an information leak.  I
      haven't a clue what someone with evil intentions could do with that
      information, but in any attack against the kernel it looks like this is the
      perfect tool for aiming that attack.
      
      So since NT_TASKSTRUCT is useless as currently defined and is potentially
      dangerous, let's just not export it.
      
      (akpm: Daniel Jacobowitz <dan@debian.org> "would be amazed" if anything was
      using NT_TASKSTRUCT).
      Signed-off-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      a9289728
  21. 30 Oct, 2005 1 commit
  22. 11 Oct, 2005 1 commit
  23. 22 Jun, 2005 1 commit
    • Wolfgang Wander's avatar
      [PATCH] Avoiding mmap fragmentation · 1363c3cd
      Wolfgang Wander authored
      
      
      Ingo recently introduced a great speedup for allocating new mmaps using the
      free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
      causes huge performance increases in thread creation.
      
      The downside of this patch is that it does lead to fragmentation in the
      mmap-ed areas (visible via /proc/self/maps), such that some applications
      that work fine under 2.4 kernels quickly run out of memory on any 2.6
      kernel.
      
      The problem is twofold:
      
        1) the free_area_cache is used to continue a search for memory where
           the last search ended.  Before the change new areas were always
           searched from the base address on.
      
           So now new small areas are cluttering holes of all sizes
           throughout the whole mmap-able region whereas before small holes
           tended to close holes near the base leaving holes far from the base
           large and available for larger requests.
      
        2) the free_area_cache also is set to the location of the last
           munmap-ed area so in scenarios where we allocate e.g.  five regions of
           1K each, then free regions 4 2 3 in this order the next request for 1K
           will be placed in the position of the old region 3, whereas before we
           appended it to the still active region 1, placing it at the location
           of the old region 2.  Before we had 1 free region of 2K, now we only
           get two free regions of 1K -> fragmentation.
      
      The patch addresses thes issues by introducing yet another cache descriptor
      cached_hole_size that contains the largest known hole size below the
      current free_area_cache.  If a new request comes in the size is compared
      against the cached_hole_size and if the request can be filled with a hole
      below free_area_cache the search is started from the base instead.
      
      The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
      (earlier posted) leakme.c test program terminates after 50000+ iterations
      with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
      (as expected) with thread creation, Ingo's test_str02 with 20000 threads
      requires 0.7s system time.
      
      Taking out Ingo's patch (un-patch available per request) by basically
      deleting all mentions of free_area_cache from the kernel and starting the
      search for new memory always at the respective bases we observe: leakme
      terminates successfully with 11 distinctive hardly fragmented areas in
      /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
      time for Ingo's test_str02 with 20000 threads.
      
      Now - drumroll ;-) the appended patch works fine with leakme: it ends with
      only 7 distinct areas in /proc/self/maps and also thread creation seems
      sufficiently fast with 0.71s for 20000 threads.
      Signed-off-by: default avatarWolfgang Wander <wwc@rentec.com>
      Credit-to: "Richard Purdie" <rpurdie@rpsys.net>
      Signed-off-by: default avatarKen Chen <kenneth.w.chen@intel.com>
      Acked-by: Ingo Molnar <mingo@elte.hu> (partly)
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      1363c3cd
  24. 16 Jun, 2005 1 commit
  25. 17 May, 2005 1 commit
  26. 28 Apr, 2005 1 commit
  27. 16 Apr, 2005 2 commits
    • Benjamin Herrenschmidt's avatar
      [PATCH] ppc64: Improve mapping of vDSO · 547ee84c
      Benjamin Herrenschmidt authored
      
      
      This patch reworks the way the ppc64 is mapped in user memory by the kernel
      to make it more robust against possible collisions with executable
      segments.  Instead of just whacking a VMA at 1Mb, I now use
      get_unmapped_area() with a hint, and I moved the mapping of the vDSO to
      after the mapping of the various ELF segments and of the interpreter, so
      that conflicts get caught properly (it still has to be before
      create_elf_tables since the later will fill the AT_SYSINFO_EHDR with the
      proper address).
      
      While I was at it, I also changed the 32 and 64 bits vDSO's to link at
      their "natural" address of 1Mb instead of 0.  This is the address where
      they are normally mapped in absence of conflict.  By doing so, it should be
      possible to properly prelink one it's been verified to work on glibc.
      Signed-off-by: default avatarBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
      547ee84c
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4