[Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the name

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

Vulpix
Junior Member
Junior Member
Posts: 28
Joined: 2013-05-15, 18:06 UTC

[Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the name

Post by *Vulpix »

This issue isn't actually new to b8, it was there before.

I found this
viewtopic.php?p=410581&hilit=UTF16#p410581
and this
viewtopic.php?p=406060&hilit=UTF16#p406060
but the problem I'm having is a bit different, so I'm making a new bug report.

What happens - if you use multi-rename tool and use the "edit names" option to open/export the file list, the notepad is opened with UTF-16 LE setting. I don't know if it's just my machine but it happens on both of my computers, one with W11 and one with 10 (x64, latest updates).

This isn't normally a problem, but I've found out that if the file is saved as UTF16-LE (which is what it defaults to when opened this way), then longer filenames that are listed inside get cut off when the file is loaded.
This isn't a character "count" so much as bytes, I guess - because it happens faster when I have some Japanese characters in the filename. Also I say "when the file is loaded", because in fact the file contents themselves are just fine. And if you resave the same file as UTF-8 and open it with multi rename tool, you will get your cut-off filename parts back!

If I manually select UTF-8 and save the file as such, it gets loaded properly without any issues. But I always have to manually select it.

When I open notepad on its own, it has UTF-8 selected by default. Heck if I try to open notepad by typing "notepad" in Total Commander's commandline, it is also opened with UTF-8 encoding, leading me to believe that something in how Multi Rename Tool creates the text file/calls notepad causes it to open with UTF-16 LE.

Can we have it open in UTF-8 by default?

You can test it easily by making an example file with this name:
"普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくてなにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて.txt"

When you use the "edit name" and then edit and load that file, the name will be truncated to
"普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくて" , i.e. the rest of the name "なにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて" as well as the extension ".txt" is missing.
Last edited by Vulpix on 2023-06-24, 16:57 UTC, edited 3 times in total.
User avatar
Dalai
Power Member
Power Member
Posts: 9398
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE

Post by *Dalai »

Please try the following: Select the files in TC, then save the list via menu item Mark > Save Selection to File and open that file in Notepad. Are the names shown correctly then? If not, this might be an issue of Notepad.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Vulpix
Junior Member
Junior Member
Posts: 28
Joined: 2013-05-15, 18:06 UTC

Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE

Post by *Vulpix »

Dalai wrote: 2023-06-23, 17:13 UTC Please try the following: Select the files in TC, then save the list via menu item Mark > Save Selection to File and open that file in Notepad. Are the names shown correctly then? If not, this might be an issue of Notepad.

Regards
Dalai
I tried - that one is saved as UTF-16 LE as well. And therefore, of course, even though the name _inside_ the file is correct, it does not work properly when I load that file into Total Commander.
If I open and re-save that same file with the exact same content as UTF-8 and load it in Total Commander, it works fine (i.e. same scenario as in my original issue)
User avatar
Dalai
Power Member
Power Member
Posts: 9398
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE

Post by *Dalai »

Does it work properly in previous TC versions, 10.52 for example?
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE

Post by *AntonyD »

In the same menu there is a command to select the editor that is used for this purpose.
What happens if you choose a MORE professional editor over this "notepad"?
So that in the editor you can see at once what encoding now uses the engine of the program
and that all unicode characters and any direction of the language are supported.
#146217 personal license
Vulpix
Junior Member
Junior Member
Posts: 28
Joined: 2013-05-15, 18:06 UTC

Re: [TC 11.00b8 Multi-rename tool] Edit names opens notepad with UTF16-LE

Post by *Vulpix »

It's funny but I found my own post from several years ago when I came across this issue in different context.

https://answers.microsoft.com/en-us/windows/forum/all/notepad-bug-with-encoding-auto-detection/9fb5571c-cb90-4499-8d22-b715de844c79

It is in fact a bug of notepad, or rather, "detection" algorithm that for whatever reason selects UTF16-LE in this case.

which in turn is like a 20year old problem of https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-istextunicode .

Pity I cannot force this algo to somehow pick UTF-8 by default if it isn't sure...
User avatar
Dalai
Power Member
Power Member
Posts: 9398
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE

Post by *Dalai »

As far as I know TC saves the list file as UTF-16 LE (with a BOM) if necessary. Necessary means that the filename characters don't fit in the current ANSI codepage (which is very likely for Asian characters). If Notepad detects the file as UTF-16 LE it would be correct, and since the file has a BOM, it would be very wrong if it detected anything else.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE

Post by *AntonyD »

2Vulpix
It is in fact a bug of notepad
So Did you try to use a more professional editor, as i asked you above?
I can sure you that any real editors, and not this "notepad" - can much more correctly and even on-the-fly - with one-click action - to encode text file with a proper encoding...
Honestly - I do not understand people who probably on the principle - well, it opens quickly, there is always in OS - choose this fake, "unfinished" `editor`...
#146217 personal license
Vulpix
Junior Member
Junior Member
Posts: 28
Joined: 2013-05-15, 18:06 UTC

Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE

Post by *Vulpix »

Dalai wrote: 2023-06-24, 00:11 UTC As far as I know TC saves the list file as UTF-16 LE (with a BOM) if necessary. Necessary means that the filename characters don't fit in the current ANSI codepage (which is very likely for Asian characters). If Notepad detects the file as UTF-16 LE it would be correct, and since the file has a BOM, it would be very wrong if it detected anything else.

Regards
Dalai
If that is the case, why does TC then fail to read the contents? Or rather, why does it read them just fine if I re-save the file as UTF8? I mean technically notepad also reads the contents just fine when they're in UTF16-LE; it's only the multi rename tool which then cuts off part of the name because of it, that's the problem. So again while for sure there is something weird with how Notepad treats unicode stuff, it is also true that TC handles the resulting file strangely, hmm. Anyway, I'll just live with it. I doubt anyone will fix it and I don't need to be attacked by other people chatting on here for my choice of the most basic of all text editors :)
User avatar
Dalai
Power Member
Power Member
Posts: 9398
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE

Post by *Dalai »

If you save the file in a different encoding, Notepad will change the BOM (and the contents). TC will probably detect that change and read the contents accordingly. Please answer my question above whether or not it works in previous TC versions. If yes, it's a TC bug introduced recently. If not, we could investigate further.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Vulpix
Junior Member
Junior Member
Posts: 28
Joined: 2013-05-15, 18:06 UTC

Re: [Turned out to not be a TC bug] Edit names opens notepad with UTF16-LE

Post by *Vulpix »

Dalai wrote: 2023-06-24, 15:10 UTC If you save the file in a different encoding, Notepad will change the BOM (and the contents). TC will probably detect that change and read the contents accordingly. Please answer my question above whether or not it works in previous TC versions. If yes, it's a TC bug introduced recently. If not, we could investigate further.

Regards
Dalai
Ah, somehow I typed it up but it then when I found it was probably related to notepad more so than anything else, I erased it.
No, it didn't work in latest stable either (I just tried).

I also tried it with notepad++, it doesn't work either. Notepad++ also detects the file to be UTF16-LE with BOM, and it has the correct contents when I view the file myself - but if I try to load the same file _into_ Total Commander (be it latest stable or beta), it doesn't work.
Works fine if I re-save it as UTF-8 though. On both stable and latest beta.

So in short the problem isn't so much the fact that the file is saved as UTF16-LE because I mean that's okay.
The problem is that when UTF16-LE is loaded into multi-rename tool, part of the filename is cut off, if the filename is too many bytes. Again, does not seem to be related to length, more to a byte-size (?).
User avatar
Dalai
Power Member
Power Member
Posts: 9398
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the

Post by *Dalai »

So I just tried it with the filename in the OP in Notepad (Win7) and Notepad++. The editor used doesn't really matter because a change to the names in the editor isn't required to trigger this bug (though I'm not sure if it is one).

I can confirm your observation.

Here are the steps to reproduce:
  1. Create a file with the following name:

    Code: Select all

    普通のファイル名だと思っていましたよね?でもこれは実はトータルコマンダーというソフトのためのファイルですからこのネームには特に意味がないです。ただ普通に長い名前を作りたくてなにかこうかなぁって思ってこれが出てきたごめんなさいね、つまらなくて.txt
    The file contents are irrelevant.
  2. In TC set the cursor on that newly created file and press Ctrl+M to open the MRT.
  3. Maximize the MRT window and make sure that the "New name" column is large enough to fully see the new name.
  4. Select "Edit names" from the little menu behind the button with the folder icon on it. This will open the editor that's set as MultiRenameEdit.
  5. Close the editor without changing anything.
  6. Confirm the dialog in MRT to load the new filenames and while doing so, closely watch the new name. It will be changed and shortened, although there were no changes to the filename in the editor!
This bug can be reproduced in TC 8.52a and 10.52, 32-bit and 64-bit; other versions might be affected.

The name is cut off after the 86th character (172 bytes). I'm not sure whether or not this has something to do with this particular filename or some characters in it, but it's possible that there's something about the 87th character tripping TC somehow. Unfortunately I'm not versed enough in encodings and such to understand all of this.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Vulpix
Junior Member
Junior Member
Posts: 28
Joined: 2013-05-15, 18:06 UTC

Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the

Post by *Vulpix »

Dalai wrote: 2023-06-24, 18:41 UTC The name is cut off after the 86th character (172 bytes). I'm not sure whether or not this has something to do with this particular filename or some characters in it, but it's possible that there's something about the 87th character tripping TC somehow. Unfortunately I'm not versed enough in encodings and such to understand all of this.
This is indeed the observation.
Also there is nothing wrong with this specific character - I had a folder full of japanese names, they were all cut off randomly, character regardless.
You can in fact just take one of them and repeat it X times ( for example あ repeated) and you will find that if you perform it enough times, eventually it will be cut off, too.

I.e. if you make a file called:

Code: Select all

ああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああああ.txt
and repeat your steps, again the filename will be cut off after the 86th character, even though they're all the same so it can't be a problem of the character itself.

If you take the UTF16-LE encoded file itself, re-save it as UTF-8 and then use the multi rename tool to load it, you can restore the name of your file back to its original length.
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the

Post by *AntonyD »

2mod.
IMHO now - It seems that the time has come to move this discussion into the appropriate section on "bug's report"
#146217 personal license
User avatar
white
Power Member
Power Member
Posts: 4624
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: [Might be a TC (seems any version) bug - Multi-rename tool ] Loading long unicode names from UTF16-LE truncates the

Post by *white »

Moderator message from: white » 2023-06-25, 09:47 UTC

Moved to Bugs forum.
Post Reply