Skip to content
  • Maxim Patlasov's avatar
    fuse: hotfix truncate_pagecache() issue · 06a7c3c2
    Maxim Patlasov authored
    
    
    The way how fuse calls truncate_pagecache() from fuse_change_attributes()
    is completely wrong. Because, w/o i_mutex held, we never sure whether
    'oldsize' and 'attr->size' are valid by the time of execution of
    truncate_pagecache(inode, oldsize, attr->size). In fact, as soon as we
    released fc->lock in the middle of fuse_change_attributes(), we completely
    loose control of actions which may happen with given inode until we reach
    truncate_pagecache. The list of potentially dangerous actions includes
    mmap-ed reads and writes, ftruncate(2) and write(2) extending file size.
    
    The typical outcome of doing truncate_pagecache() with outdated arguments
    is data corruption from user point of view. This is (in some sense)
    acceptable in cases when the issue is triggered by a change of the file on
    the server (i.e. externally wrt fuse operation), but it is absolutely
    intolerable in scenarios when a single fuse client modifies a file without
    any external intervention. A real life case I discovered by fsx-linux
    looked like this:
    
    1. Shrinking ftruncate(2) comes to fuse_do_setattr(). The latter sends
    FUSE_SETATTR to the server synchronously, but before getting fc->lock ...
    2. fuse_dentry_revalidate() is asynchronously called. It sends FUSE_LOOKUP
    to the server synchronously, then calls fuse_change_attributes(). The
    latter updates i_size, releases fc->lock, but before comparing oldsize vs
    attr->size..
    3. fuse_do_setattr() from the first step proceeds by acquiring fc->lock and
    updating attributes and i_size, but now oldsize is equal to
    outarg.attr.size because i_size has just been updated (step 2). Hence,
    fuse_do_setattr() returns w/o calling truncate_pagecache().
    4. As soon as ftruncate(2) completes, the user extends file size by
    write(2) making a hole in the middle of file, then reads data from the hole
    either by read(2) or mmap-ed read. The user expects to get zero data from
    the hole, but gets stale data because truncate_pagecache() is not executed
    yet.
    
    The scenario above illustrates one side of the problem: not truncating the
    page cache even though we should. Another side corresponds to truncating
    page cache too late, when the state of inode changed significantly.
    Theoretically, the following is possible:
    
    1. As in the previous scenario fuse_dentry_revalidate() discovered that
    i_size changed (due to our own fuse_do_setattr()) and is going to call
    truncate_pagecache() for some 'new_size' it believes valid right now. But
    by the time that particular truncate_pagecache() is called ...
    2. fuse_do_setattr() returns (either having called truncate_pagecache() or
    not -- it doesn't matter).
    3. The file is extended either by write(2) or ftruncate(2) or fallocate(2).
    4. mmap-ed write makes a page in the extended region dirty.
    
    The result will be the lost of data user wrote on the fourth step.
    
    The patch is a hotfix resolving the issue in a simplistic way: let's skip
    dangerous i_size update and truncate_pagecache if an operation changing
    file size is in progress. This simplistic approach looks correct for the
    cases w/o external changes. And to handle them properly, more sophisticated
    and intrusive techniques (e.g. NFS-like one) would be required. I'd like to
    postpone it until the issue is well discussed on the mailing list(s).
    
    Changed in v2:
     - improved patch description to cover both sides of the issue.
    
    Signed-off-by: default avatarMaxim Patlasov <mpatlasov@parallels.com>
    Signed-off-by: default avatarMiklos Szeredi <mszeredi@suse.cz>
    Cc: stable@vger.kernel.org
    06a7c3c2