TC unicode conversion - status update

English support forum

Moderators: white, Hacker, petermad, Stefan2

CoolWater
Power Member
Power Member
Posts: 738
Joined: 2003-03-27, 16:33 UTC

Post by *CoolWater »

Lefteous wrote:2CoolWater
Mh I have a FTP server where I created a directory "Österreich". In TC it's displayed as

Code: Select all

4F CC 88 73 74 65 72 72 65 69 63 68
UTF-8 would be

Code: Select all

C3 96 73 74 65 72 72 65 69 63 68
Any ideas?

Thi sis the FEAT command result

Code: Select all

211-Features supported
 MDTM
 MLST Type*;Size*;Modify*;Perm*;Unique*;
 REST STREAM
 SIZE
 TVFS
211 En
As far as I know, TVFS always stores the filenames in UTF-8 (see http://www.faqs.org/rfcs/rfc3659.html, section 6.1)

Regards,
CoolWater
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2CoolWater
Thanks for closing my knowledge gap.
CoolWater
Power Member
Power Member
Posts: 738
Joined: 2003-03-27, 16:33 UTC

Post by *CoolWater »

Lefteous wrote:2CoolWater
Thanks for closing my knowledge gap.
You're welcome :D
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48199
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I plan to send something like this to auto-detect UTF-8:
CLNT totalcmd (required by some servers to accept OPTS)
OPTS UTF8 ON

If the server does not return an error to the OPTS command, then UTF8 can be considered as active. This works e.g. with FileZilla or RaidenFTPd, but doesn't work with servers where UTF-8 is enabled by default and cannot be turned off. Any ideas how to auto-detect these?
Author of Total Commander
https://www.ghisler.com
gigaman
Member
Member
Posts: 131
Joined: 2003-02-14, 11:28 UTC

Post by *gigaman »

ghisler(Author) wrote:ZIP: there is currently no standard at all. I tried Winzip, RAR, 7zip, none of them supports Unicode in ZIP
Well, there is UTF-8 support defined in PKZIP format, at least according to PKWARE specification.
I'm not saying anybody supports it right now though :?
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48199
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Hmm, which one do you mean?
-Info-ZIP Unicode Path Extra Field (0x7075)
or
- General purpose bit flag:
Bit 11: Language encoding flag (EFS). If this bit is set,
the filename and comment fields for this file
must be encoded using UTF-8. (see APPENDIX D)

I think that for packing the second version is better, but should be used only for files which are outside of the current encoding (for backwards compatibility). What do you think?
Author of Total Commander
https://www.ghisler.com
Milk
Junior Member
Junior Member
Posts: 8
Joined: 2007-10-19, 05:39 UTC

Post by *Milk »

Good. :) Waiting for release!
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

ghisler(Author) wrote:I think that for packing the second version is better...What do you think?
Agree. Second option, for saving the UTF-8 encoded filenames, is better IMHO.
Pls don't forget the option "Allow to Save zips in Unicode".
gigaman
Member
Member
Posts: 131
Joined: 2003-02-14, 11:28 UTC

Post by *gigaman »

I would also agree that the second option seems better (at least easier to implement - no need to store the filename twice in the structure, just set the bit and store it in UTF-8 ).
However, I didn't do any testing about the application support; if InfoZip supports the first option (and the second option isn't supported at all at the moment), it might be better to choose the first one... don't know.
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

gigaman wrote:if InfoZip supports the first option (and the second option isn't supported at all at the moment), it might be better to choose the first one...
Indeed InfoZip may be "more standard" than the second option. So both methods are good,
maybe Ghisler can make an option:

(*) InfoZip (recommended)
( ) PKZip
haibinpro
Junior Member
Junior Member
Posts: 66
Joined: 2005-10-21, 04:55 UTC
Location: china

Post by *haibinpro »

Great news
______________________
#147708 Personal licence
BeckYang
Junior Member
Junior Member
Posts: 29
Joined: 2006-04-02, 10:33 UTC

Post by *BeckYang »

petermad wrote:Anyone found a feature not listed above

3. Compare

:?:
For "File" -> "Compare by Content..."
Could you consider to support UTF-8 display?

Most of my text file are UTF-8 encoding, not Unicode/UTF-16.
It would be really useful if TC can support UTF-8 display there. :D
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48199
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

2BeckYang
I have already described that I haven't found a good way to compare UTF-8 yet. I'm now considering to convert it in memory to UTF-16 before comparing, so I could re-use existing UTF-16 code.


About the ZIP problem: Should TC store all names in UTF-8 if that option is checked, or only those outside of the current code page?
Author of Total Commander
https://www.ghisler.com
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

ghisler(Author) wrote:About the ZIP problem: Should TC store all names in UTF-8 if that option is checked, or only those outside of the current code page?
All names, IMHO. Otherwise, I cannot use such an archive, packed under German OS, under my Russian OS
BeckYang
Junior Member
Junior Member
Posts: 29
Joined: 2006-04-02, 10:33 UTC

Post by *BeckYang »

ghisler(Author) wrote:2BeckYang
I have already described that I haven't found a good way to compare UTF-8 yet. I'm now considering to convert it in memory to UTF-16 before comparing, so I could re-use existing UTF-16 code.
Here is the screen capture that I used TC 7.01 to compare file1.txt and file2.txt
(file1.txt and file2.txt are UTF-8 encoding text file)
Image: http://server2.uploadit.org/files/ylonggyahoo-tc_compare.jpg

The compare result is in blue rectangle, the different lines are marked in red correctly.
But the chars are collapse, so it is hard to know what's different.

In red rectangle, the two files are opened using TC's internal viewer
(set "Options" to "7 UTF-8")

I think I need a option like internal viewer, so the compare result window
can display UTF-8 text file correctly.
Please consider to support the feature. Thanks![/code]
Post Reply