1. 02 Jul, 2009 1 commit
  2. 01 Jul, 2009 1 commit
    • Masami Hiramatsu's avatar
      kprobes: No need to unlock kprobe_insn_mutex · 4a2bb6fc
      Masami Hiramatsu authored
      
      
      Remove needless kprobe_insn_mutex unlocking during safety check
      in garbage collection, because if someone releases a dirty slot
      during safety check (which ensures other cpus doesn't execute
      all dirty slots), the safety check must be fail. So, we need to
      hold the mutex while checking safety.
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      LKML-Reference: <20090630210809.17851.28781.stgit@localhost.localdomain>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4a2bb6fc
  3. 29 Jun, 2009 1 commit
  4. 26 Jun, 2009 3 commits
  5. 25 Jun, 2009 2 commits
    • Paul Mundt's avatar
      ring-buffer: Make it generally available · 1155de47
      Paul Mundt authored
      
      
      In hunting down the cause for the hwlat_detector ring buffer spew in
      my failed -next builds it became obvious that folks are now treating
      ring_buffer as something that is generic independent of tracing and thus,
      suitable for public driver consumption.
      
      Given that there are only a few minor areas in ring_buffer that have any
      reliance on CONFIG_TRACING or CONFIG_FUNCTION_TRACER, provide stubs for
      those and make it generally available.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Cc: Jon Masters <jcm@jonmasters.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <20090625053012.GB19944@linux-sh.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      1155de47
    • Li Zefan's avatar
      ftrace: Remove duplicate newline · 00e54d08
      Li Zefan authored
      
      
      Before:
        # echo 'sys_open:traceon:' > set_ftrace_filter
        # echo 'sys_close:traceoff:5' > set_ftrace_filter
        # cat set_ftrace_filter
        #### all functions enabled ####
        sys_open:traceon:unlimited
      
        sys_close:traceoff:count=0
      
      After:
        # cat set_ftrace_filter
        #### all functions enabled ####
        sys_open:traceon:unlimited
        sys_close:traceoff:count=0
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4A4313A7.7030105@cn.fujitsu.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      00e54d08
  6. 24 Jun, 2009 8 commits
  7. 23 Jun, 2009 1 commit
  8. 20 Jun, 2009 3 commits
    • Peter Zijlstra's avatar
      perf_counter: Push perf_sample_data through the swcounter code · 92bf309a
      Peter Zijlstra authored
      
      
      Push the perf_sample_data further outwards to the swcounter interface,
      to abstract it away some more.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      92bf309a
    • Frederic Weisbecker's avatar
      tracing/urgent: warn in case of ftrace_start_up inbalance · 9ea1a153
      Frederic Weisbecker authored
      
      
      Prevent from further ftrace_start_up inbalances so that we avoid
      future nop patching omissions with dynamic ftrace.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      9ea1a153
    • Frederic Weisbecker's avatar
      tracing/urgent: fix unbalanced ftrace_start_up · c85a17e2
      Frederic Weisbecker authored
      
      
      Perfcounter reports the following stats for a wide system
      profiling:
      
       #
       # (2364 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
          15.40%  [k] mwait_idle_with_hints
           8.29%  [k] read_hpet
           5.75%  [k] ftrace_caller
           3.60%  [k] ftrace_call
           [...]
      
      This snapshot has been taken while neither the function tracer nor
      the function graph tracer was running.
      With dynamic ftrace, such results show a wrong ftrace behaviour
      because all calls to ftrace_caller or ftrace_graph_caller (the patched
      calls to mcount) are supposed to be patched into nop if none of those
      tracers are running.
      
      The problem occurs after the first run of the function tracer. Once we
      launch it a second time, the callsites will never be nopped back,
      unless you set custom filters.
      For example it happens during the self tests at boot time.
      The function tracer selftest runs, and then the dynamic tracing is
      tested too. After that, the callsites are left un-nopped.
      
      This is because the reset callback of the function tracer tries to
      unregister two ftrace callbacks in once: the common function tracer
      and the function tracer with stack backtrace, regardless of which
      one is currently in use.
      It then creates an unbalance on ftrace_start_up value which is expected
      to be zero when the last ftrace callback is unregistered. When it
      reaches zero, the FTRACE_DISABLE_CALLS is set on the next ftrace
      command, triggering the patching into nop. But since it becomes
      unbalanced, ie becomes lower than zero, if the kernel functions
      are patched again (as in every further function tracer runs), they
      won't ever be nopped back.
      
      Note that ftrace_call and ftrace_graph_call are still patched back
      to ftrace_stub in the off case, but not the callers of ftrace_call
      and ftrace_graph_caller. It means that the tracing is well deactivated
      but we waste a useless call into every kernel function.
      
      This patch just unregisters the right ftrace_ops for the function
      tracer on its reset callback and ignores the other one which is
      not registered, fixing the unbalance. The problem also happens
      is .30
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: stable@kernel.org
      c85a17e2
  9. 19 Jun, 2009 3 commits
    • Oleg Nesterov's avatar
      ptrace: wait_task_zombie: do not account traced sub-threads · befca967
      Oleg Nesterov authored
      
      
      The bug is ancient.
      
      If we trace the sub-thread of our natural child and this sub-thread exits,
      we update parent->signal->cxxx fields.  But we should not do this until
      the whole thread-group exits, otherwise we account this thread (and all
      other live threads) twice.
      
      Add the task_detached() check.  No need to check thread_group_empty(),
      wait_consider_task()->delay_group_leader() already did this.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      befca967
    • Peter Zijlstra's avatar
      perf_counter: Close race in perf_lock_task_context() · b49a9e7e
      Peter Zijlstra authored
      
      
      perf_lock_task_context() is buggy because it can return a dead
      context.
      
      the RCU read lock in perf_lock_task_context() only guarantees
      the memory won't get freed, it doesn't guarantee the object is
      valid (in our case refcount > 0).
      
      Therefore we can return a locked object that can get freed the
      moment we release the rcu read lock.
      
      perf_pin_task_context() then increases the refcount and does an
      unlock on freed memory.
      
      That increased refcount will cause a double free, in case it
      started out with 0.
      
      Ammend this by including the get_ctx() functionality in
      perf_lock_task_context() (all users already did this later
      anyway), and return a NULL context when the found one is
      already dead.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      b49a9e7e
    • Peter Zijlstra's avatar
      perf_counter: Simplify and fix task migration counting · e5289d4a
      Peter Zijlstra authored
      The task migrations counter was causing rare and hard to decypher
      memory corruptions under load. After a day of debugging and bisection
      we found that the problem was introduced with:
      
        3f731ca6
      
      : perf_counter: Fix cpu migration counter
      
      Turning them off fixes the crashes. Incidentally, the whole
      perf_counter_task_migration() logic can be done simpler as well,
      by injecting a proper sw-counter event.
      
      This cleanup also fixed the crashes. The precise failure mode is
      not completely clear yet, but we are clearly not unhappy about
      having a fix ;-)
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      e5289d4a
  10. 18 Jun, 2009 17 commits
    • Steven Rostedt's avatar
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt authored
      
      
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • Steven Rostedt's avatar
      function-graph: disable when both x86_32 and optimize for size are configured · eb4a0378
      Steven Rostedt authored
      
      
      On x86_32, when optimize for size is set, gcc may align the frame pointer
      and make a copy of the the return address inside the stack frame.
      The return address that is located in the stack frame may not be
      the one used to return to the calling function. This will break the
      function graph tracer.
      
      The function graph tracer replaces the return address with a jump to a hook
      function that can trace the exit of the function. If it only replaces
      a copy, then the hook will not be called when the function returns.
      Worse yet, when the parent function returns, the function graph tracer
      will return back to the location of the child function which will
      easily crash the kernel with weird results.
      
      To see the problem, when i386 is compiled with -Os we get:
      
      c106be03:       57                      push   %edi
      c106be04:       8d 7c 24 08             lea    0x8(%esp),%edi
      c106be08:       83 e4 e0                and    $0xffffffe0,%esp
      c106be0b:       ff 77 fc                pushl  0xfffffffc(%edi)
      c106be0e:       55                      push   %ebp
      c106be0f:       89 e5                   mov    %esp,%ebp
      c106be11:       57                      push   %edi
      c106be12:       56                      push   %esi
      c106be13:       53                      push   %ebx
      c106be14:       81 ec 8c 00 00 00       sub    $0x8c,%esp
      c106be1a:       e8 f5 57 fb ff          call   c1021614 <mcount>
      
      When it is compiled with -O2 instead we get:
      
      c10896f0:       55                      push   %ebp
      c10896f1:       89 e5                   mov    %esp,%ebp
      c10896f3:       83 ec 28                sub    $0x28,%esp
      c10896f6:       89 5d f4                mov    %ebx,0xfffffff4(%ebp)
      c10896f9:       89 75 f8                mov    %esi,0xfffffff8(%ebp)
      c10896fc:       89 7d fc                mov    %edi,0xfffffffc(%ebp)
      c10896ff:       e8 d0 08 fa ff          call   c1029fd4 <mcount>
      
      The compile with -Os will align the stack pointer then set up the
      frame pointer (%ebp), and it copies the return address back into
      the stack frame. The change to the return address in mcount is done
      to the copy and not the real place holder of the return address.
      
      Then compile with -O2 sets up the frame pointer first, this makes
      the change to the return address by mcount affect where the function
      will jump on exit.
      Reported-by: default avatarJake Edge <jake@lwn.net>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      eb4a0378
    • Peter Oberparleiter's avatar
      gcov: enable GCOV_PROFILE_ALL for x86_64 · 7bf99fb6
      Peter Oberparleiter authored
      
      
      Enable gcov profiling of the entire kernel on x86_64. Required changes
      include disabling profiling for:
      
      * arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
        not linked to main kernel
      * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
        profiling causes segfaults during boot (incompatible context)
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7bf99fb6
    • Peter Oberparleiter's avatar
      gcov: add gcov profiling infrastructure · 2521f2c2
      Peter Oberparleiter authored
      Enable the use of GCC's coverage testing tool gcov [1] with the Linux
      kernel.  gcov may be useful for:
      
       * debugging (has this code been reached at all?)
       * test improvement (how do I change my test to cover these lines?)
       * minimizing kernel configurations (do I need this option if the
         associated code is never run?)
      
      The profiling patch incorporates the following changes:
      
       * change kbuild to include profiling flags
       * provide functions needed by profiling code
       * present profiling data as files in debugfs
      
      Note that on some architectures, enabling gcc's profiling option
      "-fprofile-arcs" for the entire kernel may trigger compile/link/
      run-time problems, some of which are caused by toolchain bugs and
      others which require adjustment of architecture code.
      
      For this reason profiling the entire kernel is initially restricted
      to those architectures for which it is known to work without changes.
      This restriction can be lifted once an architecture has been tested
      and found compatible with gcc's profiling. Profiling of single files
      or directories is still available on all platforms (see config help
      text).
      
      [1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.html
      
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2521f2c2
    • Peter Oberparleiter's avatar
      kernel: constructor support · b99b87f7
      Peter Oberparleiter authored
      
      
      Call constructors (gcc-generated initcall-like functions) during kernel
      start and module load.  Constructors are e.g.  used for gcov data
      initialization.
      
      Disable constructor support for usermode Linux to prevent conflicts with
      host glibc.
      Signed-off-by: default avatarPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Acked-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b99b87f7
    • Alexey Dobriyan's avatar
      nsproxy: extract create_nsproxy() · 90af90d7
      Alexey Dobriyan authored
      
      
      clone_nsproxy() does useless copying of old nsproxy -- every pointer will
      be rewritten to new ns or to old ns.  Remove copying, rename
      clone_nsproxy(), create_nsproxy() will be used by C/R code to create fresh
      nsproxy on restart.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      90af90d7
    • Alexey Dobriyan's avatar
      utsns: extract creeate_uts_ns() · 4c2a7e72
      Alexey Dobriyan authored
      
      
      create_uts_ns() will be used by C/R to create fresh uts_ns.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4c2a7e72
    • Alexey Dobriyan's avatar
      pidns: rewrite copy_pid_ns() · dca4a979
      Alexey Dobriyan authored
      
      
      copy_pid_ns() is a perfect example of a case where unwinding leads to more
      code and makes it less clear.  Watch the diffstat.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Reviewed-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Acked-by: default avatarSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Reviewed-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      dca4a979
    • Alexey Dobriyan's avatar
      pidns: make create_pid_namespace() accept parent pidns · ed469a63
      Alexey Dobriyan authored
      
      
      create_pid_namespace() creates everything, but caller has to assign parent
      pidns by hand, which is unnatural.  At the moment of call new ->level has
      to be taken from somewhere and parent pidns is already available.
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Acked-by: default avatarSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Reviewed-by: default avatarWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ed469a63
    • Christoph Hellwig's avatar
      pids: clean up find_task_by_pid variants · 17f98dcf
      Christoph Hellwig authored
      
      
      find_task_by_pid_type_ns is only used to implement find_task_by_vpid and
      find_task_by_pid_ns, but both of them pass PIDTYPE_PID as first argument.
      So just fold find_task_by_pid_type_ns into find_task_by_pid_ns and use
      find_task_by_pid_ns to implement find_task_by_vpid.
      
      While we're at it also remove the exports for find_task_by_pid_ns and
      find_task_by_vpid - we don't have any modular callers left as the only
      modular caller of he old pre pid namespace find_task_by_pid (gfs2) was
      switched to pid_task which operates on a struct pid pointer instead of a
      pid_t.  Given the confusion about pid_t values vs namespace that's
      generally the better option anyway and I think we're better of restricting
      modules to do it that way.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      17f98dcf
    • Sukanto Ghosh's avatar
      sysctl.c: remove unused variable · 7338f299
      Sukanto Ghosh authored
      
      
      Remoce the unused variable 'val' from __do_proc_dointvec()
      
      The integer has been declared and used as 'val = -val' and there is no
      reference to it anywhere.
      Signed-off-by: default avatarSukanto Ghosh <sukanto.cse.iitb@gmail.com>
      Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
      Cc: Sukanto Ghosh <sukanto.cse.iitb@gmail.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7338f299
    • Oleg Nesterov's avatar
      kthreads: simplify migration_thread() exit path · 371cbb38
      Oleg Nesterov authored
      
      
      Now that kthread_stop() can be used even if the task has already exited,
      we can kill the "wait_to_die:" loop in migration_thread().  But we must
      pin rq->migration_thread after creation.
      
      Actually, I don't think CPU_UP_CANCELED or CPU_DEAD should wait for
      ->migration_thread exit.  Perhaps we can simplify this code a bit more.
      migration_call() can set ->should_stop and forget about this thread.  But
      we need a new helper in kthred.c for that.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      371cbb38
    • Oleg Nesterov's avatar
      kthreads: rework kthread_stop() · 63706172
      Oleg Nesterov authored
      
      
      Based on Eric's patch which in turn was based on my patch.
      
      kthread_stop() has the nasty problems:
      
      - it runs unpredictably long with the global semaphore held.
      
      - it deadlocks if kthread itself does kthread_stop() before it obeys
        the kthread_should_stop() request.
      
      - it is not useable if kthread exits on its own, see for example the
        ugly "wait_to_die:" hack in migration_thread()
      
      - it is not possible to just tell kthread it should stop, we must always
        wait for its exit.
      
      With this patch kthread() allocates all neccesary data (struct kthread) on
      its own stack, globals kthread_stop_xxx are deleted.  ->vfork_done is used
      as a pointer into "struct kthread", this means kthread_stop() can easily
      wait for kthread's exit.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63706172
    • Oleg Nesterov's avatar
      kthreads: simplify the startup synchronization · cdd140bd
      Oleg Nesterov authored
      
      
      We use two completions two create the kernel thread, this is a bit ugly.
      kthread() wakes up create_kthread() via ->started, then create_kthread()
      wakes up the caller kthread_create() via ->done.  But kthread() does not
      need to wait for kthread(), it can just return.  Instead kthread() itself
      can wake up the caller of kthread_create().
      
      Kill kthread_create_info->started, ->done is enough.  This improves the
      scalability a bit and sijmplifies the code.
      
      The only problem if kernel_thread() fails, in that case create_kthread()
      must do complete(&create->done).
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cdd140bd
    • Richard Kennedy's avatar
      mm: exit.c reorder wait_opts to remove padding on 64 bit builds · e1eb1ebc
      Richard Kennedy authored
      
      
      Reorder struct wait_opts to remove 8 bytes of alignment padding on 64 bit
      builds.
      Signed-off-by: default avatarRichard Kennedy <richard@rsk.demon.co.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e1eb1ebc
    • Oleg Nesterov's avatar
      do_wait: fix the theoretical race with stop/trace/cont · f95d39d1
      Oleg Nesterov authored
      
      
      do_wait:
      
      	current->state = TASK_INTERRUPTIBLE;
      
      	read_lock(&tasklist_lock);
      	... search for the task to reap ...
      
      In theory, the ->state changing can leak into the critical section.  Since
      the child can change its status under read_lock(tasklist) in parallel
      (finish_stop/ptrace_stop), we can miss the wakeup if __wake_up_parent()
      sees us in TASK_RUNNING state.  Add the barrier.
      
      Also, use __set_current_state() to set TASK_RUNNING.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f95d39d1
    • Oleg Nesterov's avatar
      do_wait: kill the old BUG_ON, use while_each_thread() · a3f6dfb7
      Oleg Nesterov authored
      
      
      do_wait() does BUG_ON(tsk->signal != current->signal), this looks like a
      raher obsolete check.  At least, I don't think do_wait() is the best place
      to verify that all threads have the same ->signal.  Remove it.
      
      Also, change the code to use while_each_thread().
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a3f6dfb7