Skip to content
  • Paul Jackson's avatar
    [PATCH] Cpuset: might sleep checking zones allowed fix · bdd804f4
    Paul Jackson authored
    Fix a couple of infrequently encountered 'sleeping function called from
    invalid context' in the cpuset hooks in __alloc_pages.  Could sleep while
    interrupts disabled.
    
    The routine cpuset_zone_allowed() is called by code in mm/page_alloc.c
    __alloc_pages() to determine if a zone is allowed in the current tasks
    cpuset.  This routine can sleep, for certain GFP_KERNEL allocations, if the
    zone is on a memory node not allowed in the current cpuset, but might be
    allowed in a parent cpuset.
    
    But we can't sleep in __alloc_pages() if in interrupt, nor if called for a
    GFP_ATOMIC request (__GFP_WAIT not set in gfp_flags).
    
    The rule was intended to be:
      Don't call cpuset_zone_allowed() if you can't sleep, unless you
      pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
      the code that might scan up ancestor cpusets and sleep.
    
    This rule was being violated in a couple of places, due to a bogus change
    made (by myself, pj) to __alloc_pages() as part of the November 2005 effort
    to cleanup its logic, and also due to a later fix to constrain which swap
    daemons were awoken.
    
    The bogus change can be seen at:
      http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
    
    
      [PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
    
    This was first noticed on a tight memory system, in code that was disabling
    interrupts and doing allocation requests with __GFP_WAIT not set, which
    resulted in __might_sleep() writing complaints to the log "Debug: sleeping
    function called ...", when the code in cpuset_zone_allowed() tried to take
    the callback_sem cpuset semaphore.
    
    We haven't seen a system hang on this 'might_sleep' yet, but we are at
    decent risk of seeing it fairly soon, especially since the additional
    cpuset_zone_allowed() check was added, conditioning wakeup_kswapd(), in
    March 2006.
    
    Special thanks to Dave Chinner, for figuring this out, and a tip of the hat
    to Nick Piggin who warned me of this back in Nov 2005, before I was ready
    to listen.
    
    Signed-off-by: default avatarPaul Jackson <pj@sgi.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    bdd804f4