Skip to content
  • Tahsin Erdogan's avatar
    ext4: xattr inode deduplication · dec214d0
    Tahsin Erdogan authored
    
    
    Ext4 now supports xattr values that are up to 64k in size (vfs limit).
    Large xattr values are stored in external inodes each one holding a
    single value. Once written the data blocks of these inodes are immutable.
    
    The real world use cases are expected to have a lot of value duplication
    such as inherited acls etc. To reduce data duplication on disk, this patch
    implements a deduplicator that allows sharing of xattr inodes.
    
    The deduplication is based on an in-memory hash lookup that is a best
    effort sharing scheme. When a xattr inode is read from disk (i.e.
    getxattr() call), its crc32c hash is added to a hash table. Before
    creating a new xattr inode for a value being set, the hash table is
    checked to see if an existing inode holds an identical value. If such an
    inode is found, the ref count on that inode is incremented. On value
    removal the ref count is decremented and if it reaches zero the inode is
    deleted.
    
    The quota charging for such inodes is manually managed. Every reference
    holder is charged the full size as if there was no sharing happening.
    This is consistent with how xattr blocks are also charged.
    
    [ Fixed up journal credits calculation to handle inline data and the
      rare case where an shared xattr block can get freed when two thread
      race on breaking the xattr block sharing. --tytso ]
    
    Signed-off-by: default avatarTahsin Erdogan <tahsin@google.com>
    Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
    dec214d0