TC 7.04a: Error unpacking ZIP created by TC („test”.zip)

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

Hello, DrShark.
DrShark wrote:ZipUnicode is for packing. How does it help to display names in archives, created in previous versions of Total Commander?
Hm, right. Once the damage has been done, T.C. 7.50 cannot undo it. :cry:

If you ZIP the file „test”.txt or “test”.txt using T.C. 7.04a (ZipAnsiNames=0), T.C. 7.04 will store both filenames as "test".txt inside the ZIP archive. This filename is illegal.
T.C. 7.04a itself cannot even extract the file "test".txt from the archive.
T.C. 7.50PB1 solves the plight by replacing the illegal double quotes by single quotes. This file can be extracted.
7Zip 4.65 solves the plight by replacing the illegal double quotes by underscores. This file can be extracted.

What else can be done? The extracting programme cannot know whether the original filename was „test”.txt or “test”.txt or even something else.

And if you read the whole thread from the start, you will see that some archiving programmes can store an restore the original filenames (because internally they use Unicode for the names). This is what T.C. 7.50 can do also. So recommending ZipUnicode=3 is meant to prevent character conversion from happening.

Once T.C. 7.04a has performed an (illegal) conversion, it cannot be undone.

Karl
Last edited by karlchen on 2009-04-18, 22:38 UTC, edited 1 time in total.
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

What else can be done? The extracting programme cannot know whether the original filename was „test”.txt or “test”.txt or even something else.
But somehow TC 7.0x knows the real filename (well, "test".txt not the same as “test”.txt but looks like it). Maybe it's possible to add into TC 7.5 the method of displaying names used in 7.0x?
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

No no no :cry: , T.C 7.50 sees "test".txt as the filename inside the archive (Using a hexviewer will confirm this). It knows this filename is illegal. Therefore a double quote in an illegal position is replaced by a single quote when T.C 7.50 dispays the filename. (The archive remains untouched).

T.C. 7.50 does not know the original filename which was “test”.txt or „test”.txt. The unmodified filename is stored nowhere inside a ZIP archive created by T.C. 7.04a.

The only reliable way of avoiding conflicts and illegal filenames caused by codepage specific characters is by using Unicode from the start. This is what T.C 7.04a could not do. T.C. 7.50 can do. But it cannot undo any stupid character conversion which has been applied by T.C. 7.04a. :cry:

About displaying the filenames as T.C. 7.04 has put them into the archive file:
7ZIP displays "test".txt, but extracts it as _test_.txt.
TC7.50 displays 'test'.txt and extracts it as 'test'.txt.
Which approach is better? Which approach looks more straight forward?

Karl
Last edited by karlchen on 2009-04-18, 22:55 UTC, edited 1 time in total.
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

Ok, the last question here is why TC 7.5 replaces illegal charachters in different way for archives created with ZipAnsiNames = 0 and 1, while 7.04a replaces them to " for both.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

DrShark wrote:Ok, the last question here is why TC 7.5 replaces illegal charachters in different way for archives created with ZipAnsiNames = 0 and 1, while 7.04a replaces them to " for both.
The changes in T.C. 7.50 were introduced by Christian Ghisler to make sure that there are no illegal filenames inside an archive which does not use Unicode names which cannot be extracted.
T.C. 7.04a could create archives holding filenames which could not be extracted directly because they were illegal like "test".txt. You cannot create a file named "test".txt on Windows.

It is not correct that T.C. 7.04a stored and displayed the same names when configured to ZIPAnsiNames=0 or ZIPAnsiNames=1. Using ZIPAnsiNames=1 mae sure that characters which did not exist in the ASCII character set were not replaced by a similar ASCII character inside the archive. You will see the difference both in T.C. 7.04a and T.C. 7.50.
The problem really is that ASCII and Ansi both only hold 256 different characters. Only characters 0..127 are identical for all European languages. Characters 128..255 are partially different depending on your language setting and hence your codepage.
Moreover the ZIP standard allows ASCII characters only. ZIPAnsiNames allowed using Ansi characters nonetheless.
All this lead to problems when characters 128..255 occurred inside filenames. There simply is no 1:1 conversion which will work for all languages and all codepages. (Only Unicode solves this issue currently.)

Karl
Last edited by karlchen on 2009-04-18, 23:18 UTC, edited 1 time in total.
User avatar
Clo
Moderator
Moderator
Posts: 5731
Joined: 2003-12-02, 19:01 UTC
Location: Bordeaux, France
Contact:

Fits in a few words

Post by *Clo »

2karlchen

:) Hello Karl !

• The reason of such an annoyance is simple, and fits in a few words, I guess :
- Open any dialogue in the WinRar.EXE with a resource editor, and watch which font is stated…
- Same with the TC-EXE - EVEN THE ONE OF 7.5 Pß-1 - and you find… ???
Font.Name = 'MS Sans Serif' :twisted:
… which does NOT contain these characters, hence direct internal handlings of them are impossible…
• Supposing I name a file “anything”.txt, that is an ANSI name, nothing to do with Unicode,
and these #127-128 characters are not illegal.
And I do NOT want to retrieve this with apostrophes or whatever characters instead…
- The new options telling of “Non-English characters…” come from the using of that antiquated font, no more.
- I hope that in TC 8, we could get rid of this, simply using a complete programming font (like Dina),
the programme could get a good mark, and simpler options…
“Quod erat demonstrandum”

:mrgreen: VG
Claude
Clo
#31505 Traducteur Français de TC French translator Aide en Français Tutoriels Français English Tutorials
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

I meant "why TC 7.5 displays ' instead of " for zipansinames=0-archive and У & Ф for zipansinames=1-archive" - there is short version of my question.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

DrShark wrote:I meant "why TC 7.5 displays ' instead of " for zipansinames=0-archive and У & Ф for zipansinames=1-archive" - there is short version of my question.
ZIPAnsiNames=0:
T.C. 7.04 will store and display the filenames “test”.txt and „test”.txt both as "test".txt. (This is an illegal filename on Windows)
T.C. 7.50 amends the displayed name to read 'test'.txt. (This is a legal filename on Windows.)

ZIPAnsiNames=1:
T.C. 7.04 will store and display the filenames “test”.txt and „test”.txt both as ätestö.txt. (That is what it looks like using the Western codepage. Using the Eastern codepage it may look like УtestФ.txt. This is a legal filename on Windows)
T.C. 7.50 displays the filename unmodified, because it is a legal Windows filename, though it may look ugly.

Karl
User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

Hi, Clo.

I cannot really agree with you here. The bottom of the problem is not the display font, but the fact that the Ansi character sets all have got only 256 characters and that 128 of them are different depending on your language and your codepage. This leads to conflicts.

Only character sets which hold the same set of characters regardless of my own language will help avoid those conflicts. Such character sets can never be the old ANSI character sets. Currently only Unicode can do this.

To add an extra layer of trouble to this, the ZIP standard does not use Ansi, but ASCII characters. This means you are forced to convert any character in a filename on which Ansi and Ascii disagree to the corresponding ASCII character. The trouble is that not each Ansi character will have an ASCII character which corresponds to it.

Hm, thinking about it, the ZIP standard allowing ASCII characters only is the deepest bottom of the problem, because there is e.g. no corresponding ASCII characters for the ANSI characters „ ” “. This is where the vicious circle of converting into illegal characters starts. :evil:

Kind regards,
Karl
Last edited by karlchen on 2009-04-18, 23:55 UTC, edited 1 time in total.
User avatar
Clo
Moderator
Moderator
Posts: 5731
Joined: 2003-12-02, 19:01 UTC
Location: Bordeaux, France
Contact:

Strange improvement---

Post by *Clo »

2karlchen

:) Again…
…T.C. 7.50 amends the displayed name to read 'test'.txt. (This is a legal filename on Windows.)
- To me, it damages the original file name, and the result is not allowed in the French typography…
- Seems fancy to you, perhaps, but then a lot of i****s use such a style everywhere in French texts, that's a disaster…
- And finally, I can't believe that you wish to keep the current outdated internal font. :shock:

:mrgreen: VG
Claude
Clo
#31505 Traducteur Français de TC French translator Aide en Français Tutoriels Français English Tutorials
User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

Hi, Claude.

I have not said anything to the effect that I am particularly font of MS Sans Serif, nor that I want to keep it.
Yet, it is not MS Sans Serif that forces ZIP archivers to store filenames using the ancient ASCII characterset.
E.g. the developers of Winzip and 7Zip have started extending the ZIP standard (or call it breaking it) by storing any filenames which hold non-English characters (anything above the hexvalue of 0x7F) as Unicode names. This seems to be the right approach.
T.C. 7.50 can do so, too.
Hence, I do not quite understand why we spend so much time on discussing a way of fixing a problem for which there is no real solution, but only partially successful workarounds, no matter which Windows font you use inside T.C.
No matter which Unicode font you use inside T.C., this will not work around any problem brought about by any obsolete ANSI/ASCII character conversion applied to filenames by whichever ZIP archiver.

Kind regards,
Karl
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Post by *DrShark »

Well, while " - illegal char for Windows (Explorer or CMD doesn't allow to create filename with such chars), it still legal for NTFS and Windows will work with it.
Allowed characters in filenames, http://en.wikipedia.org/wiki/Ntfs wrote:In Posix namespace, any UTF-16 code unit (case sensitive) except U+0000 (NUL) and / (slash). In Win32 namespace, any UTF-16 code unit (case insensitive) except U+0000 (NUL) / (slash) \ (backslash) : (colon) * (asterisk) (Question mark) " (quote) < (less than) > (greater than) and | (pipe)
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
karlchen
Power Member
Power Member
Posts: 4603
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Re: TC 7.04a: Error unpacking ZIP created by TC („test”.zip)

Post by *karlchen »

Hi, folks.

The original problem which Gyla reported was this:
Gyla wrote:OS: WinXP SP3 HUN, NTFS, Hungarian regional settings
1. Create a text file with this name: „test”.txt (including the quotation marks)
2. Pack it into a ZIP using TC
3. Now try to unpack the ZIP
It won't succeed because the „” (84,94) characters are converted into normal quotation marks "" (22,22)
The same will happen for a filename like “test”.txt e.g.

The reason why the internal ZIP archiver of T.C. 7.04a was likely to create invalid filenames which prevented the files from being extracted by T.C. 7.04a has been explained by Christian Ghisler himself here.
This has nothing to do with Unicode, but with the fact that ZIP files use OEM/DOS characters by default. Apparently the Windows function CharToOem converts the quotes (which are not present in the DOS charset) to standard quotes "".
Currently the only workaround is to force TC to use the Windows charset, but this will cause problems to various unpackers.
Total Commander v7.50 Public Beta 1 can extract files from the old ZIP archives even if they hold illegal filenames like "test".txt.

Therefore, from my perspective the reported issue has been fixed in Total Commander v7.50 Public Beta 1.
At least, this has been the result of the thread TC 7.5b4: Error unpacking ZIP created by TC („test”.zip) inside the Beta Testers' Forum (URL of the Beta forum plus t=3843).

Kind regards,
Karl
Post Reply