[Bug] Invalid filenames encoding in ZIP

English support forum

Moderators: white, Hacker, petermad, Stefan2

User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

[Bug] Invalid filenames encoding in ZIP

Post by *Alextp »

This bug reported by user Zom at forum.wincmd.ru, and I confrim it.

TC 6.54, WinXP SP2. Default locale: russian.

1. Create file with russian name, e.g. "Текстовый документ.txt".

2. In Explorer: right click, "Send To", "Compressed ZIP folder" (translated from russian).
So Explorer creates file "Текстовый документ.zip".

3. Open this zip in TC: name is visible in wrong encoding: "’ҐЄбв®ўл© ¤®Єг¬Ґ­в.txt"

So a bug is in TC or in Explorer. Both WinRAR and FAR show correct name in zip,
so I think it's in TC. File here:
http://atorg.net.ru/temp/BadEnc.zip
User avatar
Sheepdog
Power Member
Power Member
Posts: 5150
Joined: 2003-12-18, 21:44 UTC
Location: Berlin, Germany
Contact:

Post by *Sheepdog »

With WinXP Pro German both Explorer and TC show here

Code: Select all

ÆѬßÔ«óÙ® ñ«¬Ò¼Ñ¡Ô.txt
which can be unpacked and opened pretty well.

I can create a file named "Текстовый документ.txt" but if I try to zip it TC says:

"Error writing, disk probably full!"

When I try to move this file with Explorer into a zip I get a message:

"The compressing could be done because the name of the file or folder contains chars that could not be saved in zip compressed Folders." (translation of the german error message)

sheepdog
"A common mistake that people make when trying to design something
completely foolproof is to underestimate the ingenuity of complete fools."
Douglas Adams
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

This is because of Unicode name (I suppose).
Sheepdog, I think you have German locale, don't you? So, please, test with file names containing umlauts and/or "es-zet" letters and tell us the result.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

I can create a file named "Текстовый документ.txt" but if I try to zip it TC says
If you've created it on your German Windows then its name is stored in unicode, right? If we create such files on a Russian Windows then they are stored in local codepage.

You know that cp1252 is used for both English and German language, so we need that Someone from Belgrade (or anyone from China), who probably uses another codepage (so in order to confirm this bug he needs to create a file with name in his local codepage and create a zip-folder from it)...

Dunno whether the Spaniards use cp1252 for writing in their languages...
Last edited by XPEHOPE3KA on 2006-06-06, 11:57 UTC, edited 1 time in total.
User avatar
Sheepdog
Power Member
Power Member
Posts: 5150
Joined: 2003-12-18, 21:44 UTC
Location: Berlin, Germany
Contact:

Post by *Sheepdog »

I think you have German locale, don't you? So, please, test with file names containing umlauts and/or "es-zet" letters and tell us the result.
Yep, German locale and it works fine with äöüß.
You know that cp1251 is used for both English and German language, so we need that Someone from Belgrade (or anyone from China), who probably uses another codepage (so in order to confirm this bug he needs to create a file with name in his local codepage and create a zip-folder from it)...
Actually German and English use cp1252 (Winodws Latin-1) while cp1251 is Windows Cyrillic (regarding to my Windows help).

But neverthless you are right. If I create a file "Текстовый документ.txt" it is saved as unicode.

sheepdog
"A common mistake that people make when trying to design something
completely foolproof is to underestimate the ingenuity of complete fools."
Douglas Adams
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

Actually German and English use cp1252 (Winodws Latin-1) while cp1251 is Windows Cyrillic
Oh, of course... :oops: Corrected now.
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

Yes, I tested now on my Virtual PC. Unfortunately, the bug is not reproduced on German locale. :(

I found the exact way to reproduce it: one should use a locale that has different code than English. Both English and German locales have code 1252, that's why there is no such problem with ZIPs. But in Russian (1251) and Polish (1250) the bug is reproduced.

If someone wishes to make the experiment, do the following:
1. Go to Control Panel -> Regional and Language Options, and set Polish language in the Regional Options and Advanced tabs. A computer restart will be needed. (Maybe, only Advanced tab would be enough, I don't know exactly.)
2. Create a text file with the name containing some Polish characters, e.g. ąćęłńóśźż.txt
3. Zip it in two ways: first by Send To, second by Alt+F5 from TC (let's name these two ZIP files as sendto.zip and fromtc.zip respectively).
3. Open in TC the file fromtc.zip. You will see inside it the file ąćęłńóśźż.txt. So, the name is shown normally.
4. Open in TC the file sendto.zip. You will see the file Ą†©ˆä˘˜«ľ.txt. This is incorrect.
5. Open these archives in Windows Explorer: both archives are opened and the correct name ąćęłńóśźż.txt is shown.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48093
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

The problem is the following: ZIP files do not define how non-ascii characters need to be encoded. Some zip tools create them with DOS/ASCII coding, some with Windows/ANSI encoding. If the operating system flag is set to Windows, TC assumes that the Windows encoding is used. If the flag is set to DOS, TC assumes the DOS encoding. Apparently the Explorer mixes up these two.
Author of Total Commander
https://www.ghisler.com
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

ghisler(Author)
Maybe, allow the users of TC control using this flag by some wincmd.ini option? E.g. DOSWinZIP={D|W|A}, D means that TC always treats names as ASCII (DOS), W - always as Win (ANSI), A - automatic (current behaviour). I think, there are so few real DOS-encoded ZIP files, that such an option set to W would be convenient.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

2ghisler(Author)
Should we wait for a fix from MS? They might stick to backwards compatibility and might never change the behaviour...
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48093
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

The problem is that when I change the behaviour of TC to "eat" this format, dozens of other zip files created by other popular zip tools will no longer work!
Author of Total Commander
https://www.ghisler.com
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

Maybe you can count the number of "strange" characters in filenames and provide a dialog to change the encoding if needed? Of course, with configurable default choice and it must be disableable :wink:
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

But Windows somehow "eats" them all... Maybe, it creates some special format of ZIP, so that it would be possible to recognize it and handle specifically?
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

This bug is still present in TC6.55pb1. Possible ways to fix this are described above.
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48093
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Unfortunately I haven't found a solution to this problem yet. With codepage 1252 (Latin 1) it's quite easy: If the file is reported as Windows, but contains characters with codes between 128 and 159, then it's using the OEM/DOS encoding.

The Russian codepages 1251 and OEM 866 do not allow such a simple detection. Perhaps you have an idea how to distinguish the two?
Author of Total Commander
https://www.ghisler.com
Post Reply