Preliminary information about Unicode support (TC7.5)

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Yes, that's about how I have implemented it now, with two differences:
1. If the string contains accents etc. from the current code page only, it will be stored as ANSI for full backwards compatibility.
2. If the string contains characters from a different codepage, it will be stored as UTF-8, but with the UTF-8 three byte prefix. This way all strings from the current codepage will still work with older TC versions too, but strings from a different codepage will show up as garbage and not work. But such paths wouldn't have worked with the old version anyway...
Author of Total Commander
https://www.ghisler.com
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

strings from a different codepage will show up as garbage and not work
I thought you are currently using 8+3 names to safe paths which contain characters frm different codepages.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Only when you enter them - TC 7.0x cannot show Unicode paths at all yet, but TC 7.5 will.
Author of Total Commander
https://www.ghisler.com
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

Only when you enter them
Yes of course only when I enter them. So it's clear that opening a questionable folder in TC 7.5 and later opening it in TC < 7.5 will not work. Why not save the 8+3 filename (just like in TC 7.0) if available and check for the full unicode file name while reading the entry?
If there is no 8+3 filename save it as UTF-8. The directory couldn't be opened in < TC 7.5 anyway - but only in this case.
VadiMGP
Power Member
Power Member
Posts: 672
Joined: 2003-04-05, 12:11 UTC
Location: Israel

Post by *VadiMGP »

ghisler(Author) wrote:Yes, that's about how I have implemented it now, with two differences:
I've checked this method with ini file saved in UTF-8 by Notepad.
GetPrivateProfileString cannot find first section when file starts from BOM.
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2VadiMGP
I understood it like that: He is not writing a BOM in the beginning of the file. He writes a BOM in front of each each UTF-8 encoded value to know that he has to decode it when reading it later.

There is another thing that could be discussed. Many (path) settings are not available in the options dialog. It could become quite difficult to edit the ini file considerung this changes.
VadiMGP
Power Member
Power Member
Posts: 672
Joined: 2003-04-05, 12:11 UTC
Location: Israel

Post by *VadiMGP »

Lefteous wrote:He is not writing a BOM in the beginning of the file. He writes a BOM in front of each each UTF-8 encoded value to know that he has to decode it when reading it later.
I understood this exactly in the same way. But user can click on menu "Change Settings Files Directly". And what will be after this?
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2VadiMGP
But user can click on menu "Change Settings Files Directly". And what will be after this?
That's what the 2nd paragraph in my post is about. I guess a hex editor could be a better tool :shock:
VadiMGP
Power Member
Power Member
Posts: 672
Joined: 2003-04-05, 12:11 UTC
Location: Israel

Post by *VadiMGP »

Lefteous wrote:I guess a hex editor could be a better too
No, thanks. I prefer to use text editors. Anyway, I use UTF-16 for years and I will continue to use it. I just wanted to express my opinion - efforts to supportig UTF-8 (including all efforts of plugin autors and probable collision when user will save file in UTF-8 by editor) doesn't seems to me attractive.

I didn't checked yet another issue - using UTF-8 in entry names. Some entry names are based on user input (i.e. [Search] section)
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

No, thanks.
You didn't see the smiley don't you? I fully agree with you that the presented approach has some disadvantages and additional expenses for users who needs to edit the ini file manually and also for plug-in addon/writers who needs to read/write information from/to TC ini files.
VadiMGP
Power Member
Power Member
Posts: 672
Joined: 2003-04-05, 12:11 UTC
Location: Israel

Post by *VadiMGP »

Lefteous wrote:You didn't see the smiley don't you?
No, I didn't. But I believe you. :P
gigaman
Member
Member
Posts: 131
Joined: 2003-02-14, 11:28 UTC

Post by *gigaman »

Lefteous wrote:Why not save the 8+3 filename (just like in TC 7.0) if available and check for the full unicode file name while reading the entry?
If there is no 8+3 filename save it as UTF-8.
Even 8+3 name doesn't have to have an Ansi representation.
I don't know why is that, but I've seen a (Japanese XP) system with Japanese characters in the account name. Short filename generation was enabled, the account name was short (3 characters) - but it couldn't be transformed into an ordinary 8+3 filename; GetShortPathName() call succeeded, but it just kept the user's folder (C:\Documents and Settings\XXX) in its original form - i.e. containing those Japanese characters.
So, for an Ansi program running in another (non-Japanese) language, this user's folder was inaccessible (which unfortunatelly includes the TEMP folder, etc.)

What I'm trying to say is just that even 8+3 names (returned from GetShortPathName() for example) should be checked - if they can be stored correctly without Unicode/UTF.
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2gigaman
Even 8+3 name doesn't have to have an Ansi representation.
That's indeed interesting.

I guess when it comes to betatesting users who deal with mixed writing systems day and night should be invited.
User avatar
tbeu
Power Member
Power Member
Posts: 1336
Joined: 2003-07-04, 07:52 UTC
Location: Germany
Contact:

Post by *tbeu »

Hello Lefteous,

since you are the pioneer in developing Unicode supporting (content) plugins for the upcoming TC 7.5 maybe you can share some programming hints beside the one in http://ghisler.ch/board/viewtopic.php?t=17135. Your help would be very appreciated.

Regards
tbeu

[mod]The next two posts moved here from DirSizeCalc 2.10 (content plugin).

Hacker (Moderator)[/mod]
TC plugins: Autodesk 3ds Max / Inventor / Revit Preview, FileInDir, ImageMetaData (JPG Comment/EXIF/IPTC/XMP), MATLAB MAT-file Viewer, Mover, SetFolderDate, Solid Edge Preview, Zip2Zero and more
User avatar
fenix_productions
Power Member
Power Member
Posts: 1979
Joined: 2005-08-07, 13:23 UTC
Location: Poland
Contact:

Post by *fenix_productions »

Support for this request :)

Honestly: I've read whole linked thread and hlp file for new content plugins and... I don't get it :( The most important thing for me is to know how to deal with ft_string or ft_stringw but there are no examples. Furthermore: no info about dealing with ft_fulltext in Unicode case is provided.

Well I will also post in proper thread :)
"When we created the poke, we thought it would be cool to have a feature without any specific purpose." Facebook...

#128099
Post Reply