1. 26 Jun, 2005 1 commit
    • Anton Altaparmakov's avatar
      NTFS: Fix a nasty deadlock that appeared in recent kernels. · ba6d2377
      Anton Altaparmakov authored
            The situation: VFS inode X on a mounted ntfs volume is dirty.  For
            same inode X, the ntfs_inode is dirty and thus corresponding on-disk
            inode, i.e. mft record, which is in a dirty PAGE_CACHE_PAGE belonging
            to the table of inodes, i.e. $MFT, inode 0.
            What happens:
            Process 1: sys_sync()/umount()/whatever...  calls
            __sync_single_inode() for $MFT -> do_writepages() -> write_page for
            the dirty page containing the on-disk inode X, the page is now locked
            -> ntfs_write_mst_block() which clears PageUptodate() on the page to
            prevent anyone else getting hold of it whilst it does the write out.
            This is necessary as the on-disk inode needs "fixups" applied before
            the write to disk which are removed again after the write and
            PageUptodate is then set again.  It then analyses the page looking
            for dirty on-disk inodes and when it finds one it calls
            ntfs_may_write_mft_record() to see if it is safe to write this
            on-disk inode.  This then calls ilookup5() to check if the
            corresponding VFS inode is in icache().  This in turn calls ifind()
            which waits on the inode lock via wait_on_inode whilst holding the
            global inode_lock.
            Process 2: pdflush results in a call to __sync_single_inode for the
            same VFS inode X on the ntfs volume.  This locks the inode (I_LOCK)
            then calls write-inode -> ntfs_write_inode -> map_mft_record() ->
            read_cache_page() for the page (in page cache of table of inodes
            $MFT, inode 0) containing the on-disk inode.  This page has
            PageUptodate() clear because of Process 1 (see above) so
            read_cache_page() blocks when it tries to take the page lock for the
            page so it can call ntfs_read_page().
            Thus Process 1 is holding the page lock on the page containing the
            on-disk inode X and it is waiting on the inode X to be unlocked in
            ifind() so it can write the page out and then unlock the page.
            And Process 2 is holding the inode lock on inode X and is waiting for
            the page to be unlocked so it can call ntfs_readpage() or discover
            that Process 1 set PageUptodate() again and use the page.
            Thus we have a deadlock due to ifind() waiting on the inode lock.
            The solution: The fix is to use the newly introduced
            ilookup5_nowait() which does not wait on the inode's lock and hence
            avoids the deadlock.  This is safe as we do not care about the VFS
            inode and only use the fact that it is in the VFS inode cache and the
            fact that the vfs and ntfs inodes are one struct in memory to find
            the ntfs inode in memory if present.  Also, the ntfs inode has its
            own locking so it does not matter if the vfs inode is locked.
      Signed-off-by: default avatarAnton Altaparmakov <aia21@cantab.net>
  2. 25 Jun, 2005 9 commits
  3. 24 Jun, 2005 30 commits