- Site Admin
- Posts: 39791
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
1. If the string contains accents etc. from the current code page only, it will be stored as ANSI for full backwards compatibility.
2. If the string contains characters from a different codepage, it will be stored as UTF-8, but with the UTF-8 three byte prefix. This way all strings from the current codepage will still work with older TC versions too, but strings from a different codepage will show up as garbage and not work. But such paths wouldn't have worked with the old version anyway...
Yes of course only when I enter them. So it's clear that opening a questionable folder in TC 7.5 and later opening it in TC < 7.5 will not work. Why not save the 8+3 filename (just like in TC 7.0) if available and check for the full unicode file name while reading the entry?Only when you enter them
If there is no 8+3 filename save it as UTF-8. The directory couldn't be opened in < TC 7.5 anyway - but only in this case.
I understood it like that: He is not writing a BOM in the beginning of the file. He writes a BOM in front of each each UTF-8 encoded value to know that he has to decode it when reading it later.
There is another thing that could be discussed. Many (path) settings are not available in the options dialog. It could become quite difficult to edit the ini file considerung this changes.
I understood this exactly in the same way. But user can click on menu "Change Settings Files Directly". And what will be after this?Lefteous wrote:He is not writing a BOM in the beginning of the file. He writes a BOM in front of each each UTF-8 encoded value to know that he has to decode it when reading it later.
No, thanks. I prefer to use text editors. Anyway, I use UTF-16 for years and I will continue to use it. I just wanted to express my opinion - efforts to supportig UTF-8 (including all efforts of plugin autors and probable collision when user will save file in UTF-8 by editor) doesn't seems to me attractive.Lefteous wrote:I guess a hex editor could be a better too
I didn't checked yet another issue - using UTF-8 in entry names. Some entry names are based on user input (i.e. [Search] section)
Even 8+3 name doesn't have to have an Ansi representation.Lefteous wrote:Why not save the 8+3 filename (just like in TC 7.0) if available and check for the full unicode file name while reading the entry?
If there is no 8+3 filename save it as UTF-8.
I don't know why is that, but I've seen a (Japanese XP) system with Japanese characters in the account name. Short filename generation was enabled, the account name was short (3 characters) - but it couldn't be transformed into an ordinary 8+3 filename; GetShortPathName() call succeeded, but it just kept the user's folder (C:\Documents and Settings\XXX) in its original form - i.e. containing those Japanese characters.
So, for an Ansi program running in another (non-Japanese) language, this user's folder was inaccessible (which unfortunatelly includes the TEMP folder, etc.)
What I'm trying to say is just that even 8+3 names (returned from GetShortPathName() for example) should be checked - if they can be stored correctly without Unicode/UTF.
since you are the pioneer in developing Unicode supporting (content) plugins for the upcoming TC 7.5 maybe you can share some programming hints beside the one in http://ghisler.ch/board/viewtopic.php?t=17135. Your help would be very appreciated.
Message from moderator
Honestly: I've read whole linked thread and hlp file for new content plugins and... I don't get it The most important thing for me is to know how to deal with ft_string or ft_stringw but there are no examples. Furthermore: no info about dealing with ft_fulltext in Unicode case is provided.
Well I will also post in proper thread