9.0b9 x64 - wincmd.ini encoding issues

The behaviour described in the bug report is either by design, or would be far too complex/time-consuming to be changed

Moderators: Hacker, petermad, Stefan2, white

User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

MVV wrote:and these BOMs may tell editors that file has UTF-8
Not just the BOMs, but also the UTF-8 byte sequence following the BOM
MVV wrote:but you will not see these BOMs because BOM has no visible representation
Not necessarily. The scintilla based editors will somehow alter the font rendering when putting a BOM in front of non-ASCII characters.
https://abload.de/img/bomtestdjuo0.png
https://abload.de/img/bomtest310kwj.png
I also noted this on some browsers: the character following a BOM is slightly "set off" (one or two pixels).
MVV wrote:I think that it is a bad idea nowadays to open files in Windows Notepad by default because of mentioned reasons
That's why TC IMO should clearly state that you need to honor the ANSI file encoding when manually editing the ini file, at least putting it in the help section "ini file Settings" (4.b).
MVV wrote:It is correct that TC uses Windows API that only support ANSI and UTF-16, UTF-8 is not supported, but TC may store some strings in UTF-8 with personal BOMs
I think we already clarified that.
TC plugins: PCREsearch and RegXtract
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50550
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Maybe your editor sees the UTF-8 BOM in the middle of the file, or the UTF-8 encoded characters, and assumes that the entire file is UTF-8 (which it is not). As others have written, the Windows INI file functions do not support UTF-8.
Author of Total Commander
https://www.ghisler.com
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

2ghisler
The scenario is actually not far fetched: on a fresh ini search for a non-codepage string. Open the ini in an editor and it will detect is as UTF-8. Basically any editor that has encoding detection would see is as UTF-8.
Now users with no knowledge about the intended encoding might edit the ini and save it as UTF-8 with a prefixed BOM, or try to recode it to ANSI.
Therefore like I said before: wouldn't it be better to at least clearly state in the help file what encoding the ini file needs to have?
TC plugins: PCREsearch and RegXtract
mag
Junior Member
Junior Member
Posts: 35
Joined: 2008-10-06, 08:35 UTC

Post by *mag »

There are actually 2 issues here as I see them

1. Mixing
a) strings with national characters that still fit into the system code page
and
b) strings with characters that don't
results in the INI file containing first group of strings being encoded in ANSI and second group being encoded in UTF-8 (locally - each such string if prefixed with BOM). That will confuse a lot of text editors. For example I often use PSPad and it can't handle that (it will process the whole file as UTF-8 encoded and the first group of strings will be malformed). You will need to find an editor that can.

2. The default editor for INI files is Windows Notepad. If we don't change that, the tcmd option "Configuration / Change Settings Files Directly" will open the INI file in the Notepad (tcmd doesn't respect its configured Editor here) and if that will consider the file being UTF-8 encoded (which is easy to achieve) it will add the UTF-8 BOM to its beginning upon saving and the resulting INI file will cause troubles.


Note that both issues would be solved if the INI file would be encoded in UTF-16 LE (ideally since the very beginning).
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50550
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Unfortunately Windows does not create UTF-16 ini files by itself when just writing strings with INI functions.
Author of Total Commander
https://www.ghisler.com
mag
Junior Member
Junior Member
Posts: 35
Joined: 2008-10-06, 08:35 UTC

Post by *mag »

And can you work around that by adding a (perhaps optional, like it's already for changing the ini file location) conversion operation (to UTF-16 LE) into the installer for example?
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

ghisler(Author),
I think you can simply create wincmd.ini yourself in UTF-16 encoding with any contents, e.g. such one:

Code: Select all

[Configuration]
test=0
So further API calls will work with this Unicode file.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50550
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I found a better workaround which also works with already existing ini files (as long as the byte order marker hasn't been added yet): TC now adds the following line:
SetEncoding=äö.do.not.remove

I use äö because the sequence gives valid dual byte characters also in dual byte languages like Chinese. In Cyrillic it would be дц, in Chinese 漩 etc. etc. Notepad sees this and does NOT switch to UTF-8 mode because it's not a valid UTF-8 sequence. That's how it was normally supposed to work: Users mainly search and save using their own language, strings from other codepages should be in the vast minority.
Author of Total Commander
https://www.ghisler.com
Post Reply