Skip to content
  • Hin-Tak Leung's avatar
    hfsplus: correct usage of HFSPLUS_ATTR_MAX_STRLEN for non-English attributes · bf29e886
    Hin-Tak Leung authored
    
    
    HFSPLUS_ATTR_MAX_STRLEN (=127) is the limit of attribute names for the
    number of unicode character (UTF-16BE) storable in the HFS+ file system.
    Almost all the current usage of it is wrong, in relation to NLS to
    on-disk conversion.
    
    Except for one use calling hfsplus_asc2uni (which should stay the same)
    and its uses in calling hfsplus_uni2asc (which was corrected in the
    earlier patch in this series concerning usage of hfsplus_uni2asc), all
    the other uses are of the forms:
    
    - char buffer[size]
    
    - bound check: "if (namespace_adjusted_input_length > size) return failure;"
    
    Conversion between on-disk unicode representation and NLS char strings
    (in whichever direction) always needs to accommodate the worst-case NLS
    conversion, so all char buffers of that size need to have a
    NLS_MAX_CHARSET_SIZE x .
    
    The bound checks are all wrong, since they compare nls_length derived
    from strlen() to a unicode length limit.
    
    It turns out that all the bound-checks do is to protect
    hfsplus_asc2uni(), which can fail if the input is too large.
    
    There is only one usage of it as far as attributes are concerned, in
    hfsplus_attr_build_key().  It is in turn used by hfsplus_find_attr(),
    hfsplus_create_attr(), hfsplus_delete_attr().  Thus making sure that
    errors from hfsplus_asc2uni() is caught in hfsplus_attr_build_key() and
    propagated is sufficient to replace all the bound checks.
    
    Unpropagated errors from hfsplus_asc2uni() in the file catalog code was
    addressed recently in an independent patch "hfsplus: fix longname
    handling" by Sougata Santra.
    
    Before this patch, trying to set a 55 CJK character (in a UTF-8 locale,
    > 127/3=42) attribute plus user prefix fails with:
    
        $ setfattr -n user.`cat testing-string` -v `cat testing-string` \
            testing-string
        setfattr: testing-string: Operation not supported
    
    and retrieving a stored long attributes is particular ugly(!):
    
        find /mnt/* -type f -exec getfattr -d {} \;
        getfattr: /mnt/testing-string: Input/output error
    
    with console log:
        [268008.389781] hfsplus: unicode conversion failed
    
    After the patch, both of the above works.
    
    FYI, the test attribute string is prepared with:
    
    echo -e -n \
    "\xe9\x80\x99\xe6\x98\xaf\xe4\xb8\x80\xe5\x80\x8b\xe9\x9d\x9e\xe5" \
    "\xb8\xb8\xe6\xbc\xab\xe9\x95\xb7\xe8\x80\x8c\xe6\xa5\xb5\xe5\x85" \
    "\xb6\xe4\xb9\x8f\xe5\x91\xb3\xe5\x92\x8c\xe7\x9b\xb8\xe7\x95\xb6" \
    "\xe7\x84\xa1\xe8\xb6\xa3\xe3\x80\x81\xe4\xbb\xa5\xe5\x8f\x8a\xe7" \
    "\x84\xa1\xe7\x94\xa8\xe7\x9a\x84\xe3\x80\x81\xe5\x86\x8d\xe5\x8a" \
    "\xa0\xe4\xb8\x8a\xe6\xaf\xab\xe7\x84\xa1\xe6\x84\x8f\xe7\xbe\xa9" \
    "\xe7\x9a\x84\xe6\x93\xb4\xe5\xb1\x95\xe5\xb1\xac\xe6\x80\xa7\xef" \
    "\xbc\x8c\xe8\x80\x8c\xe5\x85\xb6\xe5\x94\xaf\xe4\xb8\x80\xe5\x89" \
    "\xb5\xe5\xbb\xba\xe7\x9b\xae\xe7\x9a\x84\xe5\x83\x85\xe6\x98\xaf" \
    "\xe7\x82\xba\xe4\xba\x86\xe6\xb8\xac\xe8\xa9\xa6\xe4\xbd\x9c\xe7" \
    "\x94\xa8\xe3\x80\x82" | tr -d ' '
    
    (= "pointlessly long attribute for testing", elaborate Chinese in
    UTF-8 enoding).
    
    However, it is not possible to set double the size (110 + 5 is still
    under 127) in a UTF-8 locale:
    
        $setfattr -n user.`cat testing-string testing-string` -v \
            `cat testing-string testing-string` testing-string
        setfattr: testing-string: Numerical result out of range
    
    110 CJK char in UTF-8 is 330 bytes - the generic get/set attribute
    system call code in linux/fs/xattr.c imposes a 255 byte limit.  One can
    use a combination of iconv to encode content, changing terminal locale
    for viewing, and an nls=cp932/cp936/cp949/cp950 mount option to fully
    use 127-unicode attribute in a double-byte locale.
    
    Also, as an additional information, it is possible to (mis-)use unicode
    half-width/full-width forms (U+FFxx) to write attributes which looks
    like english but not actually ascii.
    
    Thanks Anton Altaparmakov for reviewing the earlier ideas behind this
    change.
    
    [akpm@linux-foundation.org: fix build]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: default avatarHin-Tak Leung <htl10@users.sourceforge.net>
    Cc: Anton Altaparmakov <anton@tuxera.com>
    Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Sougata Santra <sougata@tuxera.com>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    bf29e886