[Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the name
Moderators: Hacker, petermad, Stefan2, white
[Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the name
This issue isn't actually new to b8, it was there before.
I found this
viewtopic.php?p=410581&hilit=UTF16#p410581
and this
viewtopic.php?p=406060&hilit=UTF16#p406060
but the problem I'm having is a bit different, so I'm making a new bug report.
What happens - if you use multi-rename tool and use the "edit names" option to open/export the file list, the notepad is opened with UTF-16 LE setting. I don't know if it's just my machine but it happens on both of my computers, one with W11 and one with 10 (x64, latest updates).
This isn't normally a problem, but I've found out that if the file is saved as UTF16-LE (which is what it defaults to when opened this way), then longer filenames that are listed inside get cut off when the file is loaded.
This isn't a character "count" so much as bytes, I guess - because it happens faster when I have some Japanese characters in the filename. Also I say "when the file is loaded", because in fact the file contents themselves are just fine. And if you resave the same file as UTF-8 and open it with multi rename tool, you will get your cut-off filename parts back!
If I manually select UTF-8 and save the file as such, it gets loaded properly without any issues. But I always have to manually select it.
When I open notepad on its own, it has UTF-8 selected by default. Heck if I try to open notepad by typing "notepad" in Total Commander's commandline, it is also opened with UTF-8 encoding, leading me to believe that something in how Multi Rename Tool creates the text file/calls notepad causes it to open with UTF-16 LE.
Can we have it open in UTF-8 by default?
You can test it easily by making an example file with this name:
"普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくてなにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて.txt"
When you use the "edit name" and then edit and load that file, the name will be truncated to
"普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくて" , i.e. the rest of the name "なにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて" as well as the extension ".txt" is missing.
I found this
viewtopic.php?p=410581&hilit=UTF16#p410581
and this
viewtopic.php?p=406060&hilit=UTF16#p406060
but the problem I'm having is a bit different, so I'm making a new bug report.
What happens - if you use multi-rename tool and use the "edit names" option to open/export the file list, the notepad is opened with UTF-16 LE setting. I don't know if it's just my machine but it happens on both of my computers, one with W11 and one with 10 (x64, latest updates).
This isn't normally a problem, but I've found out that if the file is saved as UTF16-LE (which is what it defaults to when opened this way), then longer filenames that are listed inside get cut off when the file is loaded.
This isn't a character "count" so much as bytes, I guess - because it happens faster when I have some Japanese characters in the filename. Also I say "when the file is loaded", because in fact the file contents themselves are just fine. And if you resave the same file as UTF-8 and open it with multi rename tool, you will get your cut-off filename parts back!
If I manually select UTF-8 and save the file as such, it gets loaded properly without any issues. But I always have to manually select it.
When I open notepad on its own, it has UTF-8 selected by default. Heck if I try to open notepad by typing "notepad" in Total Commander's commandline, it is also opened with UTF-8 encoding, leading me to believe that something in how Multi Rename Tool creates the text file/calls notepad causes it to open with UTF-16 LE.
Can we have it open in UTF-8 by default?
You can test it easily by making an example file with this name:
"普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくてなにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて.txt"
When you use the "edit name" and then edit and load that file, the name will be truncated to
"普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくて" , i.e. the rest of the name "なにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて" as well as the extension ".txt" is missing.
Last edited by Vulpix on 2023-06-24, 16:57 UTC, edited 3 times in total.
Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE
Please try the following: Select the files in TC, then save the list via menu item Mark > Save Selection to File and open that file in Notepad. Are the names shown correctly then? If not, this might be an issue of Notepad.
Regards
Dalai
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE
I tried - that one is saved as UTF-16 LE as well. And therefore, of course, even though the name _inside_ the file is correct, it does not work properly when I load that file into Total Commander.Dalai wrote: 2023-06-23, 17:13 UTC Please try the following: Select the files in TC, then save the list via menu item Mark > Save Selection to File and open that file in Notepad. Are the names shown correctly then? If not, this might be an issue of Notepad.
Regards
Dalai
If I open and re-save that same file with the exact same content as UTF-8 and load it in Total Commander, it works fine (i.e. same scenario as in my original issue)
Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE
Does it work properly in previous TC versions, 10.52 for example?
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE
In the same menu there is a command to select the editor that is used for this purpose.
What happens if you choose a MORE professional editor over this "notepad"?
So that in the editor you can see at once what encoding now uses the engine of the program
and that all unicode characters and any direction of the language are supported.
What happens if you choose a MORE professional editor over this "notepad"?
So that in the editor you can see at once what encoding now uses the engine of the program
and that all unicode characters and any direction of the language are supported.
#146217 personal license
Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE
It's funny but I found my own post from several years ago when I came across this issue in different context.
https://answers.microsoft.com/en-us/windows/forum/all/notepad-bug-with-encoding-auto-detection/9fb5571c-cb90-4499-8d22-b715de844c79
It is in fact a bug of notepad, or rather, "detection" algorithm that for whatever reason selects UTF16-LE in this case.
which in turn is like a 20year old problem of https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode .
Pity I cannot force this algo to somehow pick UTF-8 by default if it isn't sure...
https://answers.microsoft.com/en-us/windows/forum/all/notepad-bug-with-encoding-auto-detection/9fb5571c-cb90-4499-8d22-b715de844c79
It is in fact a bug of notepad, or rather, "detection" algorithm that for whatever reason selects UTF16-LE in this case.
which in turn is like a 20year old problem of https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode .
Pity I cannot force this algo to somehow pick UTF-8 by default if it isn't sure...
Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE
As far as I know TC saves the list file as UTF-16 LE (with a BOM) if necessary. Necessary means that the filename characters don't fit in the current ANSI codepage (which is very likely for Asian characters). If Notepad detects the file as UTF-16 LE it would be correct, and since the file has a BOM, it would be very wrong if it detected anything else.
Regards
Dalai
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE
2Vulpix
I can sure you that any real editors, and not this "notepad" - can much more correctly and even on-the-fly - with one-click action - to encode text file with a proper encoding...
Honestly - I do not understand people who probably on the principle - well, it opens quickly, there is always in OS - choose this fake, "unfinished" `editor`...
So Did you try to use a more professional editor, as i asked you above?It is in fact a bug of notepad
I can sure you that any real editors, and not this "notepad" - can much more correctly and even on-the-fly - with one-click action - to encode text file with a proper encoding...
Honestly - I do not understand people who probably on the principle - well, it opens quickly, there is always in OS - choose this fake, "unfinished" `editor`...
#146217 personal license
Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE
If that is the case, why does TC then fail to read the contents? Or rather, why does it read them just fine if I re-save the file as UTF8? I mean technically notepad also reads the contents just fine when they're in UTF16-LE; it's only the multi rename tool which then cuts off part of the name because of it, that's the problem. So again while for sure there is something weird with how Notepad treats unicode stuff, it is also true that TC handles the resulting file strangely, hmm. Anyway, I'll just live with it. I doubt anyone will fix it and I don't need to be attacked by other people chatting on here for my choice of the most basic of all text editorsDalai wrote: 2023-06-24, 00:11 UTC As far as I know TC saves the list file as UTF-16 LE (with a BOM) if necessary. Necessary means that the filename characters don't fit in the current ANSI codepage (which is very likely for Asian characters). If Notepad detects the file as UTF-16 LE it would be correct, and since the file has a BOM, it would be very wrong if it detected anything else.
Regards
Dalai

Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE
If you save the file in a different encoding, Notepad will change the BOM (and the contents). TC will probably detect that change and read the contents accordingly. Please answer my question above whether or not it works in previous TC versions. If yes, it's a TC bug introduced recently. If not, we could investigate further.
Regards
Dalai
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE
Ah, somehow I typed it up but it then when I found it was probably related to notepad more so than anything else, I erased it.Dalai wrote: 2023-06-24, 15:10 UTC If you save the file in a different encoding, Notepad will change the BOM (and the contents). TC will probably detect that change and read the contents accordingly. Please answer my question above whether or not it works in previous TC versions. If yes, it's a TC bug introduced recently. If not, we could investigate further.
Regards
Dalai
No, it didn't work in latest stable either (I just tried).
I also tried it with notepad++, it doesn't work either. Notepad++ also detects the file to be UTF16-LE with BOM, and it has the correct contents when I view the file myself - but if I try to load the same file _into_ Total Commander (be it latest stable or beta), it doesn't work.
Works fine if I re-save it as UTF-8 though. On both stable and latest beta.
So in short the problem isn't so much the fact that the file is saved as UTF16-LE because I mean that's okay.
The problem is that when UTF16-LE is loaded into multi-rename tool, part of the filename is cut off, if the filename is too many bytes. Again, does not seem to be related to length, more to a byte-size (?).
Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the
So I just tried it with the filename in the OP in Notepad (Win7) and Notepad++. The editor used doesn't really matter because a change to the names in the editor isn't required to trigger this bug (though I'm not sure if it is one).
I can confirm your observation.
Here are the steps to reproduce:
The name is cut off after the 86th character (172 bytes). I'm not sure whether or not this has something to do with this particular filename or some characters in it, but it's possible that there's something about the 87th character tripping TC somehow. Unfortunately I'm not versed enough in encodings and such to understand all of this.
Regards
Dalai
I can confirm your observation.
Here are the steps to reproduce:
- Create a file with the following name: The file contents are irrelevant.
Code: Select all
普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくてなにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて.txt
- In TC set the cursor on that newly created file and press Ctrl+M to open the MRT.
- Maximize the MRT window and make sure that the "New name" column is large enough to fully see the new name.
- Select "Edit names" from the little menu behind the button with the folder icon on it. This will open the editor that's set as MultiRenameEdit.
- Close the editor without changing anything.
- Confirm the dialog in MRT to load the new filenames and while doing so, closely watch the new name. It will be changed and shortened, although there were no changes to the filename in the editor!
The name is cut off after the 86th character (172 bytes). I'm not sure whether or not this has something to do with this particular filename or some characters in it, but it's possible that there's something about the 87th character tripping TC somehow. Unfortunately I'm not versed enough in encodings and such to understand all of this.
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the
This is indeed the observation.Dalai wrote: 2023-06-24, 18:41 UTC The name is cut off after the 86th character (172 bytes). I'm not sure whether or not this has something to do with this particular filename or some characters in it, but it's possible that there's something about the 87th character tripping TC somehow. Unfortunately I'm not versed enough in encodings and such to understand all of this.
Also there is nothing wrong with this specific character - I had a folder full of japanese names, they were all cut off randomly, character regardless.
You can in fact just take one of them and repeat it X times ( for example あ repeated) and you will find that if you perform it enough times, eventually it will be cut off, too.
I.e. if you make a file called:
Code: Select all
ああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああ.txt
If you take the UTF16-LE encoded file itself, re-save it as UTF-8 and then use the multi rename tool to load it, you can restore the name of your file back to its original length.
Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the
2mod.
IMHO now - It seems that the time has come to move this discussion into the appropriate section on "bug's report"
IMHO now - It seems that the time has come to move this discussion into the appropriate section on "bug's report"
#146217 personal license
Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the
Moderator message from: white » 2023-06-25, 09:47 UTC
Moved to Bugs forum.