Page 1 of 1
Zip filenames encoding detection problems
Posted: 2023-04-18, 14:48 UTC
by MaxX
Just pack files any zip with cyrillic word "чек" in the name to see.
Now, in details.
1. Make some text files like these:
чеки TEST TEST.txt
чек на TEST TEST.txt
чек TEST TEST.txt
2. Pack them to zip
3. Open zip and see:
чеки TEST TEST.txt
чек на TEST TEST.txt
祪 TEST TEST.txt
File example:
Code: Select all
MIME-Version: 1.0
Content-Type: application/octet-stream; name="test.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.zip"
UEsDBBQAAgAIAJSNklYAAAAAAgAAAAAAAAARAAAA56WqIFRFU1QgVEVTVC50eHQDAFBLAwQUAAIA
CACUjZJWAAAAAAIAAAAAAAAAFAAAAOelqiCtoCBURVNUIFRFU1QudHh0AwBQSwMEFAACAAgAlI2S
VgAAAAACAAAAAAAAABIAAADnpaqoIFRFU1QgVEVTVC50eHQDAFBLAQIUABQAAgAIAJSNklYAAAAA
AgAAAAAAAAARAAAAAAAAAAAAIAAAAAAAAADnpaogVEVTVCBURVNULnR4dFBLAQIUABQAAgAIAJSN
klYAAAAAAgAAAAAAAAAUAAAAAAAAAAAAIAAAADEAAADnpaograAgVEVTVCBURVNULnR4dFBLAQIU
ABQAAgAIAJSNklYAAAAAAgAAAAAAAAASAAAAAAAAAAAAIAAAAGUAAADnpaqoIFRFU1QgVEVTVC50
eHRQSwUGAAAAAAMAAwDBAAAAlwAAAAAA
Re: Zip filenames encoding detection problems
Posted: 2023-04-18, 14:49 UTC
by MaxX
Windows explorer, WinRAR and 7zip -- all of them show names correctly, without chinese symbols.
Any soluton or fix?
Re: Zip filenames encoding detection problems
Posted: 2023-04-18, 17:42 UTC
by beb
As long as I have tested so far, this is a thing for the first two settings at [Configuration - Options: Configuration - Packer/Zip packer] dropdown "Pack Unicode names:"
As soon as I opt for the third line there "All as UTF-8 if at least one contains non-English characters >127" the naming becomes normal.
I didn't check further, though.
NB testing environment:
test_windows1251.cmd:
Code: Select all
chcp 1251
md test_windows1251
@echo off>"test_windows1251\чеки TEST TEST.txt"
@echo off>"test_windows1251\чек на TEST TEST.txt"
@echo off>"test_windows1251\чек TEST TEST.txt"
test_utf8_65001.cmd
Code: Select all
chcp 65001
md test_utf8_65001
@echo off>"test_utf8_65001\чеки TEST TEST.txt"
@echo off>"test_utf8_65001\чек на TEST TEST.txt"
@echo off>"test_utf8_65001\чек TEST TEST.txt"
test_oem866.cmd
Code: Select all
chcp 866
md test_oem866
@echo off>"test_oem866\чеки TEST TEST.txt"
@echo off>"test_oem866\чек на TEST TEST.txt"
@echo off>"test_oem866\чек TEST TEST.txt"
edit
Code: Select all
[Packer]
ZipUnicode=0...6
Determines how to pack Unicode names to ZIP archives:
0: Ask every time a Unicode name is encountered.
1: Store Unicode names as UTF-8 (Pkzip 4.5 / Winzip 11.2 method).
2: All as UTF-8 if at least one contains Unicode.
3: All as UTF-8 if at least one contains non-English characters.
4: Store Unicode name in extra field (Info-Zip method).
5: Store all names containing non-English in extra field.
6: Store Unicode characters as '?'.
Finally took the liberty to test all the options from 1 to six.
1,2,4-6 - are affected
3 - is not affected, gives normal naming.
Re: Zip filenames encoding detection problems
Posted: 2023-04-21, 12:22 UTC
by Vasilich
I use TC on windows with 2 ANSI-incompatible languages: german and russian, therefore i always see problems with missing unicode.
For ZIP compression/decompression i use setting 3 from above (all as UTF-8 if any > 127), and never had problems with both german umlauts and cyrillic letters - even if both type of symbols are packed in same archive.
The only problem can be if you get zip archive compressed without unicode (assuming user-defined codepage) and your system default codepage differs from that codepage - in that case you can unpack such archives after (temporarily) set default CP to the CP of that user. Or convince him to use unicode settings when creating archives.
Re: Zip filenames encoding detection problems
Posted: 2023-04-26, 10:30 UTC
by ghisler(Author)
The Russian text чек stored as OEM/DOS (codepage 866) uses the codes E7 A5 AA.
This is recognized as a valid UTF-8 sequence. This was added because some packers store file names in ZIP as UTF-8 but do not set the UTF-8 flag in the ZIP headers.
Try this: Put these 3 codes in a text file and open it with Windows 10/11 notepad.exe. It will also display as a Chinese character.
To disable this auto-detection, you need to open the wincmd.ini, look for header [packer] and add the line
PreferUtf8ForZip=0
Alternatively, you can manually switch the encoding to OEM/DOS (codepage 866) by right clicking on [auto] in the name header above the file list.