[TC 11.0b5] [Title altered] Korean [ 카 타 화 회 ] folders and files are renamed to [ ī Ÿ ȭ ȸ ] in zip

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

cpp64
Junior Member
Junior Member
Posts: 83
Joined: 2023-05-12, 16:03 UTC

[TC 11.0b5] [Title altered] Korean [ 카 타 화 회 ] folders and files are renamed to [ ī Ÿ ȭ ȸ ] in zip

Post by *cpp64 »

2ghisler


============================== TC 11.00 betat5 ==============================

Appending '_' at the end is fixed.

The remainig issue is shown below.


[ bfore packing ]

The "가 가 타 화 회" in each folder are all files without extensions.

├─카 // folder
│ 가 // file without extension
│ 카
│ 타
│ 화
│ 회

├─타
│ 가
│ 카
│ 타
│ 화
│ 회

├─화
│ 가
│ 카
│ 타
│ 화
│ 회

└─회







[ after unpacking ]

All [ 카 타 화 회 ] files and folders are renamed( moved? ) to [ ī Ÿ ȭ ȸ ].
And only "[ 카 타 화 회 ]\가" are corect.

I think "[카 타 화 회]\가" are enough to detect the character encoding, and the rest are not.
The rest are just the repetation of [ 카 타 화 회 ].

├─ī // folder
│ ī // file without extension
│ Ÿ
│ ȭ
│ ȸ

├─Ÿ
│ ī
│ Ÿ
│ ȭ
│ ȸ

├─ȭ
│ ī
│ Ÿ
│ ȭ
│ ȸ

├─ȸ
│ ī
│ Ÿ
│ ȭ
│ ȸ

├─카
│ 가

├─타
│ 가

├─화
│ 가

└─회






============================== TC 11.00 beta4 ==============================

Only "[6] Store all names containing non-Eniglish in extra field" is fixed.


It seems that "zip-packer considers few Korean characters as CP949(ANSI ?)" and "zip-unpacker considers it as UNICODE".

It shows why [ 카 타 화 회 ] are converted to [ Ÿ ī ȭ ȸ ].

Notepad(New version) in Windows 11 x64 also reads [ 화 ] cp949 as [ ȭ ] utf-8 (no bomb).

The character encoding detection algoritm seems to misjudge when there are not enough characters.

And '_' is ... ???


[ foler names ]
----------




레몬


바나나

사과

애플








----------


[1]
Ask every time a Unicode name is encountered
--> does not ask whether korean is UNICODE or not (It seems Zip-Packer considers Korean is not UNICODE)
--> reads cp949 as utf-8
----------
Ÿ
ī
ȭ
ȸ
----------
However, the other characters have not changed.
I'm wondering why that is.
When I read CP949 as UTF-8 in vim, the other characters are represented by '?'


[2]
Store Unicode names as UTF-8 (Pkzip 4.5/Winzip 11.2 method)
--> reads cp949 as utf-8
----------
Ÿ
ī
ȭ
ȸ
----------


[3]
All as TUF-8 if at least one contains Unicode
--> adds '_' at the end
----------
나_
다_
라_
마_
바_
바나나_
사_
아_
자_
차_
카_
파_
하_
화_
회_
----------


[4]
All as UTF-8 if at least one contains non-English characters > 127
--> adds '_' at the end
----------
나_
다_
라_
마_
바_
바나나_
사_
아_
자_
차_
카_
파_
하_
화_
회_
----------


[5]
Store Unicode name in extra field (Info-Zip method)
--> reads cp949 as utf-8
----------
Ÿ
ī
ȭ
ȸ
----------


[6]
Store all names containing non-Eniglish in extra field
--> OK (fixed)


[7] Store Unicode characters as "?"
--> reads cp949 as utf-8
----------
Ÿ
ī
ȭ
ȸ
----------



============================== TC 11.00 betat3 ==============================


[ packing ] (before)
------------(folder name)
바나나

------------

[ unpacking ] (after)
------------(folder name)
바나나
바나나_
화_
------------

os : windows 11 x64 22h2
tc : 11.0b3 and previous release version


------------------------------
ps.

This issue occurs when zip packer stores folder names to unicode.
( with all Unicode options in 'Packer Unicode names' )

and

----------



----------

is packed to

----------
Ÿ
ī
ȭ
----------

This problem can also be generated by vim when reading CP949 as UTF-8.

// save as '화' to cp949 in vim
:set fenc=cp949
:w
-----

-----

// and read cp949 as UTF-8 using ':e ++enc=utf-8'
:e ++enc=utf-8
-----
ȭ
-----
Last edited by cpp64 on 2023-06-01, 15:07 UTC, edited 45 times in total.
User avatar
white
Power Member
Power Member
Posts: 4630
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *white »

I cannot reproduce. Is that 1 folder name or 2? How do you pack? How do you store the filenames during pack, urf8-unicode? Does it show the wrong names in the zip (without unpacking). How do you unpack? Does it also happen with empty folders only?
User avatar
zhugecaomao
Junior Member
Junior Member
Posts: 10
Joined: 2022-08-23, 05:08 UTC

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *zhugecaomao »

because of [Pack Unicode names] setting i guess?

Image: https://ibb.co/FbxpXch
Total Commander Version 11.00b10 64 bit
pulbitz
Junior Member
Junior Member
Posts: 52
Joined: 2009-06-05, 12:19 UTC

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *pulbitz »

It looks like a Total Commander bug.
In the zip file below, there are two empty folders '카', '타'.
https://www.mediafire.com/file/uw66v1tehec3foa/korean.zip/file
I tested them in Total Commander 10.52 and 11.00 beta and they look like 'ī', 'Ÿ'.
In 7-Zip, WinRAR, and Windows 11 Explorer, they look fine as '카', '타'.
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *AntonyD »

2pulbitz
just call context menu over the NAME column (also on the left side of this 'word' you also will see another 'word': [AUTO]) by right mouse button click.
In opened context menu list choose UTF-8 encoding and you will get what you want.
Last edited by AntonyD on 2023-05-19, 16:05 UTC, edited 1 time in total.
#146217 personal license
pulbitz
Junior Member
Junior Member
Posts: 52
Joined: 2009-06-05, 12:19 UTC

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *pulbitz »

I'm sorry if I didn't understand you because I don't speak English very well.

The problem is that when I unpack the korean.zip file in Total Commander, it doesn't create a 카, 타 folder, but a ī, Ÿ folder, which means the characters are broken.
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *AntonyD »

Ok, I also do not speak English - but Google translator make a real magic)))))))))))))
so, let''s repeat then with pictures, like in comix)))
1) Download your archive, open it inside the Total, initial state is:
https://ibb.co/Cvsfp8z
2) Now move cursor over the column's header, like on a pic:
https://ibb.co/s5wNfB5
3) Now just call the regular context menu over the 'Name' header (also on the left side of this 'word' you also
see another 'word': [Auto]) by right mouse button click. AS usual in Explorer.
https://ibb.co/yR0586D
4) In opened context menu list choose UTF-8 encoding on the second column of choices...
and you will get what you want.
https://ibb.co/QdwL1Bs

now you can unpack selected folders as you want.

Also pls check in INI file this option:
PreferUtf8ForZip=1
in section [Packer]

see https://www.ghisler.ch/board/viewtopic.php?p=429585&hilit=PreferUtf8ForZip#p429585
and https://www.ghisler.ch/board/viewtopic.php?p=428903&hilit=PreferUtf8ForZip#p428903
#146217 personal license
cpp64
Junior Member
Junior Member
Posts: 83
Joined: 2023-05-12, 16:03 UTC

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *cpp64 »

AntonyD wrote: 2023-05-19, 17:05 UTC PreferUtf8ForZip=1
The Zip unpacker still creates empty folders that end with '_' with 'PreferUtf8ForZip=1'.

'PreferUtf8ForZip=1' doesn't work for me.
pulbitz
Junior Member
Junior Member
Posts: 52
Joined: 2009-06-05, 12:19 UTC

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *pulbitz »

Thanks for the tip.
However, after further testing, it appears that there is a fundamental problem with Total Commander's handling of UTF-8 filenames in the Zip format.
The korean.zip file I initially posted was created in WinRAR and has a UTF-8 filename stored in the extra field of the Zip format.
Leaving aside the issue that the characters look broken when I enter the korean.zip file with the Enter key in Total Commander, when I unzip it with Alt+F9, it should unzip properly to 카, 타, but it unzips to ī, Ÿ.

I created some more files to test.
https://www.mediafire.com/file/4233rcjp70ep3g1/korean_filename_test.zip/file
All as UTF-8 if at least one contains non-English characters.zip
All as UTF-8 if at least one contains Unicode.zip
Store all names containing non-English in extra field.zip
Store Unicode characters as question mark.zip
Store Unicode name in extra field (Info-Zip method).zip
Store Unicode names as UTF-8 (Pkzip 4.5_Winzip 11.2 method).zip
WinRAR.zip

With the exception of one of the test files, I can somehow unzip it the inconvenient way of changing the code page in Total Commander.
However, Zip files created with the "All as UTF-8 if at least one contains non-English characters" method can never be unzipped normally with Total Commander.
As cpp64 explained, 카 will only come out as 카_
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *AntonyD »

cpp64 wrote: 2023-05-19, 23:21 UTC
AntonyD wrote: 2023-05-19, 17:05 UTC PreferUtf8ForZip=1
The Zip unpacker still creates empty folders that end with '_' with 'PreferUtf8ForZip=1'.

'PreferUtf8ForZip=1' doesn't work for me.
But I didn’t write anything anywhere in my post that the problem with an additional character '_' would be solved))))
I just showed that it is POSSIBLE to see the correct directory names in Total if you use the built-in ability to change the encoding by calling the context menu.
*********
Although the potential for this was present :wink:
See my post below for the unboxing results! :arrow:
*********
Is it there - the ability to change rendered names correctly? It works? Yes/No? It was ONLY ABOUT THAT!
Last edited by AntonyD on 2023-05-20, 08:39 UTC, edited 2 times in total.
#146217 personal license
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *AntonyD »

Leaving aside the issue that the characters look broken when I enter the korean.zip file with the Enter key in Total Commander, when I unzip it with Alt+F9, it should unzip properly to 카, 타, but it unzips to ī, Ÿ.
I described to you HOW you need to enter ZIP archives correctly, if you had problems when you first entered through Enter! Change the encoding and everything becomes true and correct! The same goes for unpacking! After you have entered the correct archive, changed the encoding, made sure visually that everything is displayed correctly - select everything (press NUMPAD *) and copy (unzip in fact) it from there to the opposite file panel to the desired directory. No Alt-F9 needed! In addition, by default, nothing is assigned to this combination. Those there is no point in using these keys;)
https://ibb.co/kctJYZ2
As you see - no problems at all! Even undesirable char '_' is not presenting!!!!
Even with MY mother language & codepage chosen! everywhere - OS, Total...
Last edited by AntonyD on 2023-05-20, 08:43 UTC, edited 2 times in total.
#146217 personal license
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *AntonyD »

From this test bunch only archives:
"Store all names containing non-English in extra field.zip" and WinRAR.zip
were correct.
And that's strange coz definitely archive "All as UTF-8 if at least one contains non-English characters.zip" - MUST
shows the correct content - based on its logic....
See my example: https://www.mediafire.com/file/yvf0gpvs7sjqrwa/_.zip/file
#146217 personal license
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *ghisler(Author) »

I found the problem: TC should use the Unicode name from the extra field, but it used the name from the normal name field when it was recognized as Unicode. I will prefer the extra field in the next beta.
Author of Total Commander
https://www.ghisler.com
pulbitz
Junior Member
Junior Member
Posts: 52
Joined: 2009-06-05, 12:19 UTC

Re: [TC 11.0b3] zip packer creates empty folders that end with '_' in the specific korean folder name

Post by *pulbitz »

I tested the filename encoding with various Zip files created by Total Commander and by WinRAR.
https://www.mediafire.com/file/4233rcjp70ep3g1/korean_filename_test.zip/file
The sample Zip file above was created in Beta 3, but I didn't see any difference when I created it in Beta 4, so I used it for testing.

The filename encoding feature is also available in WinRAR, and I included it for comparison with Total Commander.

The two Zip files below were fine in Total Commander, WinRAR, in any case.
1. Store all names containing non-English in extra field.zip
2. WinRAR.zip
ANSI: 카, 타
UTF-8: 카, 타
Auto-detect: 카, 타
WinRAR ANSI: 카, 타
WinRAR UTF-8: 카, 타
WinRAR Auto-detect: 카, 타

The four Zip files below were normal only when using ANSI for the name encoding.
However, when unzip files with Unpack(Alt+F9) in Total Commander or the Extract menu in WinRAR, only WinRAR unzip correctly to 카, 타, while Total Commander unzip to ī, Ÿ because Auto-detect is the default.
1. All as UTF-8 if at least one contains Unicode.zip
2. Store Unicode characters as question mark.zip
3. Store Unicode name in extra field (Info-Zip method).zip
4. Store Unicode names as UTF-8 (Pkzip 4.5_Winzip 11.2 method).zip
ANSI: 카, 타
UTF-8: ī, Ÿ
Auto-detect: ī, Ÿ
WinRAR ANSI: 카, 타
WinRAR UTF-8: ī, Ÿ
WinRAR Auto-detect: 카, 타

The Zip file below looks like 카_ in Total Commander and cannot be unzipped with the correct file name. WinRAR looks like 카 and will extract it.
1. All as UTF-8 if at least one contains non-English characters.zip
ANSI: 카_, 타
UTF-8: 카_, 타
Auto-detect: 카_, 타
WinRAR ANSI: 카, 타
WinRAR UTF-8: 카, 타
WinRAR Auto-detect: 카, 타

Based on my test results, I suspect that Korean file names are not properly encoded in UTF-8 except for the 'Store all names containing non-English in extra field' method when creating Zip files in Total Commander.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48107
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [TC 11.0b4] [not fixed] [updated] zip packer creates empty folders that end with '_' in the specific korean folder n

Post by *ghisler(Author) »

The problem with the extra "_" at the end (which is actually an extra slash / in the file) should be fixed in beta 5, please try it!
29.05.23 Fixed: On dual byte character systems (e.g. with Korean locale), when packing to zip with names stored as UTF-8, sometimes an extra slash was appended to directory names (displayed as an underscore) (32/64)
Author of Total Commander
https://www.ghisler.com
Post Reply