Multi-rename tool can not save setting with unicode
Moderators: Hacker, petermad, Stefan2, white
-
- Junior Member
- Posts: 5
- Joined: 2015-10-21, 03:14 UTC
Multi-rename tool can not save setting with unicode
I use unicode character very often, so I save my wincmd.ini file into unicode. (opened by notepad and saved in unicode, not UTF8)
However, when I want to save my multi-rename setting (press F2 in multi-rename tool), any unicode contents result in garbled text.
My TC version: 8.52a
how to reproduce the problem
1. use notepad to open wincmd.ini and save it in unicode (not UTF8)
2. open multi-renaming tool
3. in file name, type any unicode characters (e.g. "テスト" with no quotation mark)
4. press F2 and save the setting with name in English (e.g. "test")
5. load the saved setting
6. file name shows garbled texts
same situation happens while using unicode setting name.
Is anyone face the same problem? And anyone has solution?
Thank you very much.
However, when I want to save my multi-rename setting (press F2 in multi-rename tool), any unicode contents result in garbled text.
My TC version: 8.52a
how to reproduce the problem
1. use notepad to open wincmd.ini and save it in unicode (not UTF8)
2. open multi-renaming tool
3. in file name, type any unicode characters (e.g. "テスト" with no quotation mark)
4. press F2 and save the setting with name in English (e.g. "test")
5. load the saved setting
6. file name shows garbled texts
same situation happens while using unicode setting name.
Is anyone face the same problem? And anyone has solution?
Thank you very much.
I don't understand why you willingly re-saved the TC ini file as UTF-16 ("Unicode").
Where did you read that only this would help to save Unicode chars?
I know that Christian sometimes states that it helps in certain situations,
but it is neither documented in the help file nor is it generally advisable.
(e.g. try use TC portable -> the Ini probably won't load on Win9x systems)
TC will create an ini file from the scratch in ANSI, and still expects this as default.
Unicode chars are saved in an ANSI ini file by prefixing an UTF-8 BOM.
If you'd just kept the ANSI file from the beginning you wouldn't have any problems in the first place.
But sure, you could see this behavior as a bug, even though it is not the standard ini format that is expected.
Where did you read that only this would help to save Unicode chars?
I know that Christian sometimes states that it helps in certain situations,
but it is neither documented in the help file nor is it generally advisable.
(e.g. try use TC portable -> the Ini probably won't load on Win9x systems)
TC will create an ini file from the scratch in ANSI, and still expects this as default.
Unicode chars are saved in an ANSI ini file by prefixing an UTF-8 BOM.
If you'd just kept the ANSI file from the beginning you wouldn't have any problems in the first place.
But sure, you could see this behavior as a bug, even though it is not the standard ini format that is expected.
TC plugins: PCREsearch and RegXtract
As I see this bugreport, TC can't properly restore saved Unicode strings in case of Unicode wincmd.ini - it is a bug that potentionally may be fixed.
Strings that are written and then read should match regardless of wincmd.ini encoding (this encoding is only a problem of Windows API, because TC sends/receives strings in the same encoding depending on called functions regardless of file encoding).
Unicode string -> UTF-8 string with BOM -> write API (convert to Unicode in case of Unicode file) -> read API (convert to ANSI in case of Unicode file) -> UTF-8 string with BOM -> Unicode string (must be the same).
Strings that are written and then read should match regardless of wincmd.ini encoding (this encoding is only a problem of Windows API, because TC sends/receives strings in the same encoding depending on called functions regardless of file encoding).
Unicode string -> UTF-8 string with BOM -> write API (convert to Unicode in case of Unicode file) -> read API (convert to ANSI in case of Unicode file) -> UTF-8 string with BOM -> Unicode string (must be the same).
As we had other Unicode-Ini problems with TC recently,
I think Christian should finally state if an Unicode Ini is officially supported or recommended.
And for that, every ini section should be tested for such recoding problems.
(there are a lot of sections with potential recoding issues)
The question remains what makes users think that they should recode their ini file.
I think Christian should finally state if an Unicode Ini is officially supported or recommended.
And for that, every ini section should be tested for such recoding problems.
(there are a lot of sections with potential recoding issues)
The question remains what makes users think that they should recode their ini file.
TC plugins: PCREsearch and RegXtract
I can't say if that problem is caused by Unicode because no one can reproduce the problem. But I think there should be no problems in re-coding INI file until re-coding is done between Unicode (UTF-16) and default Windows ANSI encoding - i.e. the two encodings that uses INI API to convert strings between. Of course I'll lost my UTF-8 characters if my system ANSI encoding is e.g. Win-1251 and I convert file to Unicode from Win-1252.
I meant that issue of recoding to UTF-8 with BOM, which is easily reproducible (see here).MVV wrote:I can't say if that problem is caused by Unicode because no one can reproduce the problem.
That's why I think there should be a statement in the help file if recoding is supported or recommended.
Exactly!MVV wrote:Of course I'll lost my UTF-8 characters if my system ANSI encoding is e.g. Win-1251 and I convert file to Unicode from Win-1252.
These are the potential problems, if you use TC portable.
Local codepage characters are saved w/o BOM.
Switching to a different system with a different ANSI codepage will not reload the proper characters, and display things weird.
Recoding between Unicode <-> non-Unicode could make things even worse, because you normally have to define to, or from, which codepage you convert.
Maybe some future TC version should switch to or force Unicode INI, or use it's own functions for Ini handling.
TC plugins: PCREsearch and RegXtract
BTW I've tried to save MRT preset with Unicode string and then convert INI to Unicode, and this string was lost (MRT displayed its BOM UTF-8 form instead of just Unicode)...
And then I've saved another MRT preset with same Unicode name, and I saw the same UTF-8 string with BOM string in INI.
And after a reverse INI conversion (Unicode to ANSI) both presets have magically become correct.
So TC saves strings in the same way regardless of file encoding, but for some reason it can't read saved Unicode strings back when INI is in Unicode.
If it matters, my MRT templates are saved to a separate user-specific INI, and first preset was saved when the two INI files were in different encodings (main one in Unicode and the user-specific one in ANSI) but second one was saved when both INI files were in Unicode (and TC was restarted then).
And then I've saved another MRT preset with same Unicode name, and I saw the same UTF-8 string with BOM string in INI.
And after a reverse INI conversion (Unicode to ANSI) both presets have magically become correct.
So TC saves strings in the same way regardless of file encoding, but for some reason it can't read saved Unicode strings back when INI is in Unicode.
If it matters, my MRT templates are saved to a separate user-specific INI, and first preset was saved when the two INI files were in different encodings (main one in Unicode and the user-specific one in ANSI) but second one was saved when both INI files were in Unicode (and TC was restarted then).
- ghisler(Author)
- Site Admin
- Posts: 50923
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Total Commander does support INI files with UTF-16 Unicode (with byte order marker), but not UTF-8 INI files. Why? The Windows "Profile string" functions only support ANSI and UTF-16.
However, some functions don't currently support Unicode characters. Saving settings in the Multi-rename tool is one of these functions not supporting Unicode. The problem is the way how these are stored:
Profile_name="[N4-]"
Profile_ext="[E]"
Profile_params=0|1|1|1
The text "Profile" here cannot currently be Unicode. While it could work with Unicode ini files, it wouldn't work with ANSI because TC cannot use the workaround it uses (with UTF-8 byte order marker) which works when the Unicode text is behind the "=".
However, some functions don't currently support Unicode characters. Saving settings in the Multi-rename tool is one of these functions not supporting Unicode. The problem is the way how these are stored:
Profile_name="[N4-]"
Profile_ext="[E]"
Profile_params=0|1|1|1
The text "Profile" here cannot currently be Unicode. While it could work with Unicode ini files, it wouldn't work with ANSI because TC cannot use the workaround it uses (with UTF-8 byte order marker) which works when the Unicode text is behind the "=".
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
2ghisler
So TC does support an UTF-16 INI principally, but on the other hand has certain problems with restoring (some) settings.
I'd say this still means that one shouldn't use anything other than the default ANSI file in general, right?
It would be nice to have some notes in the help file about it, for the next TC version.
So TC does support an UTF-16 INI principally, but on the other hand has certain problems with restoring (some) settings.
I'd say this still means that one shouldn't use anything other than the default ANSI file in general, right?
It would be nice to have some notes in the help file about it, for the next TC version.
TC plugins: PCREsearch and RegXtract
Well, we discuss corruption of INI values, not key names. My test confirms that TC can't properly restore values that were saved using mentioned workaraund in case of UTF-16 INI file.ghisler(Author) wrote:The text "Profile" here cannot currently be Unicode. While it could work with Unicode ini files, it wouldn't work with ANSI because TC cannot use the workaround it uses (with UTF-8 byte order marker) which works when the Unicode text is behind the "=".
-
- Junior Member
- Posts: 5
- Joined: 2015-10-21, 03:14 UTC
Thank you for all answers above. I am not a native English speaker so I need time to understand your comments.
I use the unicode ini because sometimes I directly edit ini file. So unicode support is very important for me.
However, as Mr. ghisler say, MRT doesn't support unicode function. I also test the UTF-8 coding and MRT works good while other problem occurs when editing the ini file (e.g. customized menu garbled text)
So currently unicode file is not a good option for wincmd.ini and directly editing thus become not so convenient.
Wish that one day TC can totally support unicode.
I use the unicode ini because sometimes I directly edit ini file. So unicode support is very important for me.
However, as Mr. ghisler say, MRT doesn't support unicode function. I also test the UTF-8 coding and MRT works good while other problem occurs when editing the ini file (e.g. customized menu garbled text)
So currently unicode file is not a good option for wincmd.ini and directly editing thus become not so convenient.
Wish that one day TC can totally support unicode.
- ghisler(Author)
- Site Admin
- Posts: 50923
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
It is just not supported in the saved NAMES, but it works in the actual search+replace and name/ext placeholders.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
- ghisler(Author)
- Site Admin
- Posts: 50923
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
2MVV
Sorry, I don't understand your comment. I can use e.g. Cyrillic just fine in that field on my Western Windows, and it is being stored and restored.
Sorry, I don't understand your comment. I can use e.g. Cyrillic just fine in that field on my Western Windows, and it is being stored and restored.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
I've tested it with mentioned in the first post "テスト" string, and TC only restored it correctly when wincmd.ini was in ANSI encoding.
These two templates were saved by TC when wincmd.ini was in different encodings (ANSI and UTF-16 LE) and they are equal so TC saves templates correctly. However TC can't restore source string when INI is in UTF-16 LE. It is 100% reproducible.
Code: Select all
MIME-Version: 1.0
Content-Type: application/octet-stream; name="wincmd_rename.ini"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="wincmd_rename.ini"
W3JlbmFtZV0NCjExMV9uYW1lPe+7v+ODhuOCueODiA0KMTExX2V4dD1bRV0NCjExMV9wYXJhbXM9
MHwxfDF8MQ0KMjIyX25hbWU977u/44OG44K544OIDQoyMjJfZXh0PVtFXQ0KMjIyX3BhcmFtcz0w
fDF8MXwxDQo=