Page 1 of 1
utf-8 filenames in zip archive
Posted: 2014-02-10, 15:18 UTC
by ccaid
TC automatically uses utf-8 code page for filenames when creates zip archives. but the automatic has shortcomings.
1) a lot of unicode characters can not be used without utf-8 code page, f.e. ellipsis U+2026. but TC still uses OEM code page for filenames with such characters. this lead to filenames distortion.
2) Russian named files zipped under Russian locale can not be correctly unzipped under other locales, f.e. under English (US) locale.
3) Russian named files can not be correctly zipped under non-Russian locales, f.e. under English (US) locale.
those problems could be prevented by option "Always use utf-8 for zipping" (or something like).
Posted: 2014-02-11, 08:57 UTC
by ghisler(Author)
Thanks for your suggestion. The UTF-8 names could be stored in extra fields, which both TC for Windows and WinZIP can handle - don't know about other packers, though. TC for Windows already has this option.
Posted: 2014-03-12, 08:07 UTC
by ccaid
7zip is NOT handle extra field.
I have downloaded TC for Windows to try its zip packer. The best result is given by "All as UTF-8 if at least one contains characters>127". Zip archives made with this option successfully read in TC for Windows, 7zip, TC for Android under Russian and English locale.
Posted: 2014-03-12, 17:47 UTC
by ghisler(Author)
Unfortunately using UTF-8 as the main name doesn't work with Windows Explorer.

Posted: 2014-03-13, 07:47 UTC
by ccaid
explorer can not handle unicode names at all as far as I understand. if one wish to store unicode names in archives and to use explorer's zip folder, he/she pedir imposibles.
and I have to repeat (just in case). some unicode chars are incorrectly transformed now (both in TC for Android and in TC for Windows with non-UTF options). f.e. ellipsis (…) is transformed to colon ( : ). names with colon are not permited for FAT/NTFS, so most packers can't unpack such files. and explorer even can't "see" such files in zip.
Posted: 2014-03-13, 14:41 UTC
by ghisler(Author)
The advantage of Unicode names in extra fields is that the archive can still be handled by text-only unpackers. Of course the names will be handled correctly only when using the same encoding, but usually the PC and the phone of a specific user use the same encoding (e.g. Cyrillic for Russian users).
Posted: 2014-03-13, 15:25 UTC
by ccaid
Yes, I see. but… for text-only unpackers there is solution at the moment (I mean current behaviour of TC for Android),
while 7zip unpacker needs TC's option like "Add UTF-8 always".