Zip filenames encoding detection problems

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
MaxX
Power Member
Power Member
Posts: 1024
Joined: 2012-03-23, 18:15 UTC
Location: UA

Zip filenames encoding detection problems

Post by *MaxX »

Just pack files any zip with cyrillic word "чек" in the name to see.

Now, in details.
1. Make some text files like these:
чеки TEST TEST.txt
чек на TEST TEST.txt
чек TEST TEST.txt

2. Pack them to zip
3. Open zip and see:
чеки TEST TEST.txt
чек на TEST TEST.txt
TEST TEST.txt


File example:

Code: Select all

MIME-Version: 1.0
Content-Type: application/octet-stream; name="test.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="test.zip"

UEsDBBQAAgAIAJSNklYAAAAAAgAAAAAAAAARAAAA56WqIFRFU1QgVEVTVC50eHQDAFBLAwQUAAIA
CACUjZJWAAAAAAIAAAAAAAAAFAAAAOelqiCtoCBURVNUIFRFU1QudHh0AwBQSwMEFAACAAgAlI2S
VgAAAAACAAAAAAAAABIAAADnpaqoIFRFU1QgVEVTVC50eHQDAFBLAQIUABQAAgAIAJSNklYAAAAA
AgAAAAAAAAARAAAAAAAAAAAAIAAAAAAAAADnpaogVEVTVCBURVNULnR4dFBLAQIUABQAAgAIAJSN
klYAAAAAAgAAAAAAAAAUAAAAAAAAAAAAIAAAADEAAADnpaograAgVEVTVCBURVNULnR4dFBLAQIU
ABQAAgAIAJSNklYAAAAAAgAAAAAAAAASAAAAAAAAAAAAIAAAAGUAAADnpaqoIFRFU1QgVEVTVC50
eHRQSwUGAAAAAAMAAwDBAAAAlwAAAAAA
Ukrainian Total Commander Translator. Feedback and discuss.
User avatar
MaxX
Power Member
Power Member
Posts: 1024
Joined: 2012-03-23, 18:15 UTC
Location: UA

Re: Zip filenames encoding detection problems

Post by *MaxX »

Windows explorer, WinRAR and 7zip -- all of them show names correctly, without chinese symbols.

Any soluton or fix?
Ukrainian Total Commander Translator. Feedback and discuss.
User avatar
beb
Senior Member
Senior Member
Posts: 430
Joined: 2009-09-20, 08:03 UTC
Location: Odesa, Ukraine

Re: Zip filenames encoding detection problems

Post by *beb »

As long as I have tested so far, this is a thing for the first two settings at [Configuration - Options: Configuration - Packer/Zip packer] dropdown "Pack Unicode names:"
As soon as I opt for the third line there "All as UTF-8 if at least one contains non-English characters >127" the naming becomes normal.
I didn't check further, though.

NB testing environment:

test_windows1251.cmd:

Code: Select all

chcp 1251
md test_windows1251
@echo off>"test_windows1251\чеки TEST TEST.txt"
@echo off>"test_windows1251\чек на TEST TEST.txt"
@echo off>"test_windows1251\чек TEST TEST.txt"
test_utf8_65001.cmd

Code: Select all

chcp 65001
md test_utf8_65001
@echo off>"test_utf8_65001\чеки TEST TEST.txt"
@echo off>"test_utf8_65001\чек на TEST TEST.txt"
@echo off>"test_utf8_65001\чек TEST TEST.txt"
test_oem866.cmd

Code: Select all

chcp 866
md test_oem866
@echo off>"test_oem866\чеки TEST TEST.txt"
@echo off>"test_oem866\чек на TEST TEST.txt"
@echo off>"test_oem866\чек TEST TEST.txt"
edit

Code: Select all

[Packer]
ZipUnicode=0...6
Determines how to pack Unicode names to ZIP archives:
	0: Ask every time a Unicode name is encountered.
	1: Store Unicode names as UTF-8 (Pkzip 4.5 / Winzip 11.2 method).
	2: All as UTF-8 if at least one contains Unicode.
	3: All as UTF-8 if at least one contains non-English characters.
	4: Store Unicode name in extra field (Info-Zip method).
	5: Store all names containing non-English in extra field.
	6: Store Unicode characters as '?'.
Finally took the liberty to test all the options from 1 to six.
1,2,4-6 - are affected
3 - is not affected, gives normal naming.
#278521 User License
Total Commander [always the latest version, including betas] x86/x64 on Win10 x64/Android 10
Vasilich
Junior Member
Junior Member
Posts: 43
Joined: 2009-08-05, 08:26 UTC
Location: Mayence, Germany

Re: Zip filenames encoding detection problems

Post by *Vasilich »

I use TC on windows with 2 ANSI-incompatible languages: german and russian, therefore i always see problems with missing unicode.
For ZIP compression/decompression i use setting 3 from above (all as UTF-8 if any > 127), and never had problems with both german umlauts and cyrillic letters - even if both type of symbols are packed in same archive.
The only problem can be if you get zip archive compressed without unicode (assuming user-defined codepage) and your system default codepage differs from that codepage - in that case you can unpack such archives after (temporarily) set default CP to the CP of that user. Or convince him to use unicode settings when creating archives.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Zip filenames encoding detection problems

Post by *ghisler(Author) »

The Russian text чек stored as OEM/DOS (codepage 866) uses the codes E7 A5 AA.
This is recognized as a valid UTF-8 sequence. This was added because some packers store file names in ZIP as UTF-8 but do not set the UTF-8 flag in the ZIP headers.

Try this: Put these 3 codes in a text file and open it with Windows 10/11 notepad.exe. It will also display as a Chinese character.

To disable this auto-detection, you need to open the wincmd.ini, look for header [packer] and add the line
PreferUtf8ForZip=0

Alternatively, you can manually switch the encoding to OEM/DOS (codepage 866) by right clicking on [auto] in the name header above the file list.
Author of Total Commander
https://www.ghisler.com
Post Reply