Skip to content
  • KAMEZAWA Hiroyuki's avatar
    memcg: synchronized LRU · 08e552c6
    KAMEZAWA Hiroyuki authored
    
    
    A big patch for changing memcg's LRU semantics.
    
    Now,
      - page_cgroup is linked to mem_cgroup's its own LRU (per zone).
    
      - LRU of page_cgroup is not synchronous with global LRU.
    
      - page and page_cgroup is one-to-one and statically allocated.
    
      - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
        - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);
    
      - SwapCache is handled.
    
    And, when we handle LRU list of page_cgroup, we do following.
    
    	pc = lookup_page_cgroup(page);
    	lock_page_cgroup(pc); .....................(1)
    	mz = page_cgroup_zoneinfo(pc);
    	spin_lock(&mz->lru_lock);
    	.....add to LRU
    	spin_unlock(&mz->lru_lock);
    	unlock_page_cgroup(pc);
    
    But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
    So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.
    
    This is a trial to remove this dirty nesting of locks.
    This patch changes mz->lru_lock to be zone->lru_lock.
    Then, above sequence will be written as
    
            spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
    	mem_cgroup_add/remove/etc_lru() {
    		pc = lookup_page_cgroup(page);
    		mz = page_cgroup_zoneinfo(pc);
    		if (PageCgroupUsed(pc)) {
    			....add to LRU
    		}
            spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
    
    This is much simpler.
    (*) We're safe even if we don't take lock_page_cgroup(pc). Because..
        1. When pc->mem_cgroup can be modified.
           - at charge.
           - at account_move().
        2. at charge
           the PCG_USED bit is not set before pc->mem_cgroup is fixed.
        3. at account_move()
           the page is isolated and not on LRU.
    
    Pros.
      - easy for maintenance.
      - memcg can make use of laziness of pagevec.
      - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
      - LRU status of memcg will be synchronized with global LRU's one.
      - # of locks are reduced.
      - account_move() is simplified very much.
    Cons.
      - may increase cost of LRU rotation.
        (no impact if memcg is not configured.)
    
    Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Cc: Pavel Emelyanov <xemul@openvz.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    08e552c6