Skip to content
  • jiangyiwen's avatar
    ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local · 35ddf78e
    jiangyiwen authored
    
    
    This patch fixes a deadlock, as follows:
    
      Node 1                Node 2                  Node 3
    1)volume a and b are    only mount vol a        only mount vol b
      mounted
    
    2)                      start to mount b        start to mount a
    
    3)                      check hb of Node 3      check hb of Node 2
                            in vol a, qs_holds++    in vol b, qs_holds++
    
    4) -------------------- all nodes' network down --------------------
    
    5)                      progress of mount b     the same situation as
                            failed, and then call   Node 2
                            ocfs2_dismount_volume.
                            but the process is hung,
                            since there is a work
                            in ocfs2_wq cannot beo
                            completed. This work is
                            about vol a, because
                            ocfs2_wq is global wq.
                            BTW, this work which is
                            scheduled in ocfs2_wq is
                            ocfs2_orphan_scan_work,
                            and the context in this work
                            needs to take inode lock
                            of orphan_dir, because
                            lockres owner are Node 1 and
                            all nodes' nework has been down
                            at the same time, so it can't
                            get the inode lock.
    
    6)                      Why can't this node be fenced
                            when network disconnected?
                            Because the process of
                            mount is hung what caused qs_holds
                            is not equal 0.
    
    Because all works in the ocfs2_wq are relative to the super block.
    
    The solution is to change the ocfs2_wq from global to local.  In other
    words, move it into struct ocfs2_super.
    
    Signed-off-by: default avatarYiwen Jiang <jiangyiwen@huawei.com>
    Reviewed-by: default avatarJoseph Qi <joseph.qi@huawei.com>
    Cc: Xue jiufei <xuejiufei@huawei.com>
    Cc: Mark Fasheh <mfasheh@suse.de>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    35ddf78e