Page 1 of 1

Zip filenames encoding detection problems

Posted: 2023-04-18, 14:48 UTC
by MaxX
Just pack files any zip with cyrillic word "чек" in the name to see.

Now, in details.
1. Make some text files like these:
чеки TEST TEST.txt
чек на TEST TEST.txt
чек TEST TEST.txt

2. Pack them to zip
3. Open zip and see:
чеки TEST TEST.txt
чек на TEST TEST.txt
TEST TEST.txt


File example:

Code: Select all

MIME-Version: 1.0
Content-Type: application/octet-stream; name="test.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.zip"

UEsDBBQAAgAIAJSNklYAAAAAAgAAAAAAAAARAAAA56WqIFRFU1QgVEVTVC50eHQDAFBLAwQUAAIA
CACUjZJWAAAAAAIAAAAAAAAAFAAAAOelqiCtoCBURVNUIFRFU1QudHh0AwBQSwMEFAACAAgAlI2S
VgAAAAACAAAAAAAAABIAAADnpaqoIFRFU1QgVEVTVC50eHQDAFBLAQIUABQAAgAIAJSNklYAAAAA
AgAAAAAAAAARAAAAAAAAAAAAIAAAAAAAAADnpaogVEVTVCBURVNULnR4dFBLAQIUABQAAgAIAJSN
klYAAAAAAgAAAAAAAAAUAAAAAAAAAAAAIAAAADEAAADnpaograAgVEVTVCBURVNULnR4dFBLAQIU
ABQAAgAIAJSNklYAAAAAAgAAAAAAAAASAAAAAAAAAAAAIAAAAGUAAADnpaqoIFRFU1QgVEVTVC50
eHRQSwUGAAAAAAMAAwDBAAAAlwAAAAAA

Re: Zip filenames encoding detection problems

Posted: 2023-04-18, 14:49 UTC
by MaxX
Windows explorer, WinRAR and 7zip -- all of them show names correctly, without chinese symbols.

Any soluton or fix?

Re: Zip filenames encoding detection problems

Posted: 2023-04-18, 17:42 UTC
by beb
As long as I have tested so far, this is a thing for the first two settings at [Configuration - Options: Configuration - Packer/Zip packer] dropdown "Pack Unicode names:"
As soon as I opt for the third line there "All as UTF-8 if at least one contains non-English characters >127" the naming becomes normal.
I didn't check further, though.

NB testing environment:

test_windows1251.cmd:

Code: Select all

chcp 1251
md test_windows1251
@echo off>"test_windows1251\чеки TEST TEST.txt"
@echo off>"test_windows1251\чек на TEST TEST.txt"
@echo off>"test_windows1251\чек TEST TEST.txt"
test_utf8_65001.cmd

Code: Select all

chcp 65001
md test_utf8_65001
@echo off>"test_utf8_65001\чеки TEST TEST.txt"
@echo off>"test_utf8_65001\чек на TEST TEST.txt"
@echo off>"test_utf8_65001\чек TEST TEST.txt"
test_oem866.cmd

Code: Select all

chcp 866
md test_oem866
@echo off>"test_oem866\чеки TEST TEST.txt"
@echo off>"test_oem866\чек на TEST TEST.txt"
@echo off>"test_oem866\чек TEST TEST.txt"
edit

Code: Select all

[Packer]
ZipUnicode=0...6
Determines how to pack Unicode names to ZIP archives:
	0: Ask every time a Unicode name is encountered.
	1: Store Unicode names as UTF-8 (Pkzip 4.5 / Winzip 11.2 method).
	2: All as UTF-8 if at least one contains Unicode.
	3: All as UTF-8 if at least one contains non-English characters.
	4: Store Unicode name in extra field (Info-Zip method).
	5: Store all names containing non-English in extra field.
	6: Store Unicode characters as '?'.
Finally took the liberty to test all the options from 1 to six.
1,2,4-6 - are affected
3 - is not affected, gives normal naming.

Re: Zip filenames encoding detection problems

Posted: 2023-04-21, 12:22 UTC
by Vasilich
I use TC on windows with 2 ANSI-incompatible languages: german and russian, therefore i always see problems with missing unicode.
For ZIP compression/decompression i use setting 3 from above (all as UTF-8 if any > 127), and never had problems with both german umlauts and cyrillic letters - even if both type of symbols are packed in same archive.
The only problem can be if you get zip archive compressed without unicode (assuming user-defined codepage) and your system default codepage differs from that codepage - in that case you can unpack such archives after (temporarily) set default CP to the CP of that user. Or convince him to use unicode settings when creating archives.

Re: Zip filenames encoding detection problems

Posted: 2023-04-26, 10:30 UTC
by ghisler(Author)
The Russian text чек stored as OEM/DOS (codepage 866) uses the codes E7 A5 AA.
This is recognized as a valid UTF-8 sequence. This was added because some packers store file names in ZIP as UTF-8 but do not set the UTF-8 flag in the ZIP headers.

Try this: Put these 3 codes in a text file and open it with Windows 10/11 notepad.exe. It will also display as a Chinese character.

To disable this auto-detection, you need to open the wincmd.ini, look for header [packer] and add the line
PreferUtf8ForZip=0

Alternatively, you can manually switch the encoding to OEM/DOS (codepage 866) by right clicking on [auto] in the name header above the file list.