Files names inside of .zip in Chinese locale Windows 7 RC

The behaviour described in the bug report is either by design, or would be far too complex/time-consuming to be changed

Moderators: white, Hacker, petermad, Stefan2

skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Files names inside of .zip in Chinese locale Windows 7 RC

Post by *skuzi »

Windows 7 RC, english version.
TC 7.50 Beta 3, english language only.
Once installed, it's OK.
I've changed locale for non-unicode programs to Chinese (Control Panel-Region & Language-Administrative-Change system locale).
Restarted computer as was asked.

Problem: non-english files names inside of some .zip file are shown with chinese characters. Plus TC and WinRAR shows different names of files inside of archive.
Even worse: some file name mixed with extensions!
Tried to change font & script of font (TC - configuration - option - font) - no success :(

Here is link (300 KB) as it should be (I used Cyrillic font's script):
http://img32.imageshack.us/img32/1936/totalcmdrussianlocalezi.png

Here is link (320 KB) shows problem:
http://img32.imageshack.us/img32/6571/totalcmdchineselocalezi.png

One additional strange thing: when opened this .zip file with Windows Explorer' built-in zip support, can see 33 files, from TC - 37, from WinRAR - 37. May be it's bug of Wndows 7 itself?

PS. I don't want to blame author of plugin, showed on screenshot. Just tried to show example, easy to check for everybody.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

This isn't a bug - the ZIP format simply doesn't store the information in which locale the names were created! The Explorer will have similar problems.

If you want to send a zip file with (e.g. European) accents to someone with Chinese Windows, you should check the following option when packing:
Configuration - Options - ZIP - Store all names with non-English characters in extra field.

Total Commander 7.5 and Winzip can handle such extra fields.
Author of Total Commander
https://www.ghisler.com
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

Thank you for information. File names' warning noted.
But the problem remains: extensions!
As you can see, for files with latin symbols extension was not changed.
For file with non-Unicode symbols, some (?) extension mixed with file names.
Reason is, for sure, "dot" sign before extension was changed to something else. But only for non-Unicode file names!
And WinRAR changed it to "?", while TC to "_" :shock:
PS. In this case "Configuration - Options - ZIP - Store all names with non-English characters in extra field" shouldn't be the default option, should it?
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

That's normal, TC stores characters as '?' which cannot be stored with the local encoding. TC shows and unpacks such characters as '_' because you cannot have file names with '?' in them.
PS. In this case "Configuration - Options - ZIP - Store all names with non-English characters in extra field" shouldn't be the default option, should it?
No, because it takes more space to store the names twice, and the problem doesn't affect people who do not send zips to people in countries with different encoding.
Author of Total Commander
https://www.ghisler.com
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

Understood why characters appear in different way in TC and WinRAR.
But, again.
As you can see, for files with latin symbols extension was not changed.
For file with non-Unicode symbols, some (?) extension mixed with file names.
Reason is, for sure, "dot" sign before extension was changed to something else. But only for non-Unicode file names!
In some cases TC changed "dot" separated name & extension to "_", in some - not. This is incorrect, I think.
1. "Dot" character exists in any font/encoding table (at least it should :) ).
2. Extensions have only english letters (characters).
3. They are shown correctly in other files' names.
4. Last argument: if extension is missing, how Windows will determine file' type?
So, here is my opinion: unreadable characters could be changed to something readable in existing locale, while "dots" should not be changed.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

In some cases TC changed "dot" separated name & extension to "_", in some - not. This is incorrect, I think.
I haven't seen such a problem yet. Can you post the names of these files here (as text, not image) so I can create them via Shift+F4 (copy+paste the name) and try to reproduce the problem? Thanks!
Author of Total Commander
https://www.ghisler.com
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

Sure!
Here they are, all from screenshot, sorted by extension:

仴嚆後_lng
侁犩岐_lng
愩後_lng
拋槈_lng
pluginst.inf
Brazilian Portuguese.lng
Cesky.lng
Chinese.lng
Croatian.lng
Dansk.lng
Deutsch.lng
English.lng
Espa醥l.lng
Francais.lng
Hellenic.lng
Hrvatski.lng
Italiano.lng
Korean.lng
Magyar.lng
Nederlands.lng
Norsk.lng
Polski.lng
Portuguese (Portugal).lng
Romanian.lng
Serbian.lng
Slovenscina.lng
Slovensky.lng
Svenska.lng
Taiwanese.lng
Turkish.lng
悌狅.lng
摢酄醐犰獱.lng
Changes.txt
Languages.txt
license.txt
readme.txt
CADView.wlx
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Are these the correct names with underscore _ instead of dot in some Chinese names before the extensions? Or is this the list of Total Commander shows it?
Author of Total Commander
https://www.ghisler.com
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

This is the list of Total Commander shows it. Got them from TC with:

Code: Select all

ctrl-A -> alt-M -> Copy selected names to clipboard -> Pasted in forum
Correct list you can see on link from my first post. That's how it should be:
Two file names with chinese characters absent in used font, are shown with replaced to readable names in current locale. But extensions left unchanged.
P.S. Or you can download this file (cadview.zip 1,16 MB - plugin for TC :) ) & check yourself to be 100% sure of names.
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

ghisler(Author) wrote:...
PS. In this case "Configuration - Options - ZIP - Store all names with non-English characters in extra field" shouldn't be the default option, should it?
No, because it takes more space to store the names twice, and the problem doesn't affect people who do not send zips to people in countries with different encoding.
Let me disagree with you. Here are my arguments:
1. Space for storing names, are you kidding? 50 more bytes? Less? Who now carry about bytes? Megabytes (well, can agree with hundreds of Kilobytes) does matter, but bytes? :shock:
2. Now seriously. As TC is declared as having full Unicode support, all Unicode options should be on-default. If somebody has his own opinion, he can choose not to use Unicode.
3. Option called "Store all names with non-English characters in extra field", which means for english name there wil be no extra field. As far as I understand, this option applies exactly for those who need it, who using non-English names. There are lot of them over the world! And they'll get full Unicode support "Out-of-box", which, for sure, will make them happy :D
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Where can I download this cadview.zip file?
Author of Total Commander
https://www.ghisler.com
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

Hm-m-m...
You'll be surprised! :)
http://www.ghisler.com/plugins.htm
Lister plugins
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I see - changing the TC pack options wouldn't have any influence on third party archives anyway, who knows what they used to create the archive. The only solution would be to let the user choose the encoding of an archive. It's on my to do list, but I don't currently know where to put the necessary button. The user interface is already more than full...
Author of Total Commander
https://www.ghisler.com
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I have checked it now - TC handles it correctly. In the Chinese locale, the character before the dot is the first part of a 2 byte character, so the dot is seen as the second part. The only solution here is that the plugin author changes the encoding of the file names (UTF-8 in extra field) or that the user can somehow choose the encoding himself.
Author of Total Commander
https://www.ghisler.com
skuzi
Junior Member
Junior Member
Posts: 16
Joined: 2009-05-24, 04:27 UTC

Post by *skuzi »

skuzi wrote: ...
3. They are shown correctly in other files' names.
Some times TC treats "dots" correctly, some - not...
Post Reply