Download lists in UTF-8: two problems

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Download lists in UTF-8: two problems

Post by *Flint »

1. Save the following text as a TXT file in UTF-8 encoding with BOM:

Code: Select all

http://www.columbia.edu/cu/cuo/F2002-2.mp3 -> Debussy - Prélude à l'après-midi d'un faun.mp3
2. Call cm_FtpDownloadList, select the newly created file, press OK.
3. The BTM window opens, downloading starts, the file name is shown in this dialog correctly. However:
4.a. The file is actually saved under the name Debussy - Pr_lude _ l'apr_s-midi d'un faun.mp3, that is with all diacritics replaced with underscores;
4.b. After downloading is finished, TC writes the "OK-" marker into the download list file, but it does not take BOM into account. So the BOM signature becomes overwritten, and next time the file won't be recognized as UTF-8.

OS: Windows 7 SP1 Pro 64-bit
TC 8.01 64-bit
Russian locale

P.S. If you are using werstern locale, the problem might not be reproduced since all the diacritical characters are present in the codepage. If so, please, try with some cyrillic file name, for example,

Code: Select all

http://www.columbia.edu/cu/cuo/F2002-2.mp3 -> Debussy - Послеполуденный отдых фавна.mp3
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

This is strange, I have just tried your two examples with TC x64:

Code: Select all

http://www.columbia.edu/cu/cuo/F2002-2.mp3 -> Debussy - Prélude à l'après-midi d'un faun.mp3
http://www.columbia.edu/cu/cuo/F2002-2.mp3 -> Debussy - Послеполуденный отдых фавна.mp3
But both names appear just fine. The only difference is that I'm using Swiss German locale, but this shouldn't make any difference since it's Unicode. I did have the BOM problem, though - maybe it happens only due to missing BOM?
Author of Total Commander
https://www.ghisler.com
umbra
Power Member
Power Member
Posts: 871
Joined: 2012-01-14, 20:41 UTC

Post by *umbra »

I can't reproduce the 4a issue. But the 4b really happens - TC removes the BOM from the file.

A bit off-topic:
The fact, that TC's Lister cannot recognize UTF-8 without BOM may be considered as a bug itself, since UTF-8 does not require BOM (it's not even recommended to include it).
Windows 7 Pro x64, Windows 10 Pro x64
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

ghisler(Author) wrote:I did have the BOM problem, though - maybe it happens only due to missing BOM?
I don't think so. First, the BOM is removed after the file is downloaded, not before it, so it cannot affect the target file name. Second, if there is no BOM, TC does not recognize the UTF-8 text at all and treats it as ANSI (I checked it).

What's interesting, I have now tested the 32-bit version of TC, and it worked fine: the name of the file was the same as in the list (with all the diacritics). So the problem is reproduced only in 64-bit version. Just in case I also tested them both with clean INI, the results are the same: the 32-bit version works fine, the 64-bit version replaces diacritics with underscores.

When I have time I'll try it in a virtual machine with other locales (including Swiss). Maybe it does have something to do with it…
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
petermad
Power Member
Power Member
Posts: 14739
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Post by *petermad »

2Flint
Just in case I also tested them both with clean INI,
Is that clean wincmd.ini or wcx_ftp.ini - or both?

Is the setting for "FTP: connection details" -> "Advanced" -> "Encoding of file names" the same in both 64 and 32 bit version?
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50b4 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

2petermad
This doesn't affect http downloads.

2Flint
Can you try the same with a line break at the beginning of the file (empty line) just after the BOM? This works for me even in TC 7.57.
Author of Total Commander
https://www.ghisler.com
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

ghisler(Author) wrote:Can you try the same with a line break at the beginning of the file (empty line) just after the BOM?
With additional newline the "OK" is added correctly, but the file name is still with replaced diacritics. (Sorry, didn't test in a virtual machine yet…)

petermad wrote:Is that clean wincmd.ini or wcx_ftp.ini - or both?
I had tested only with empty wincmd.ini, but now I re-checked with both empty INI files, and the results didn't change.
petermad wrote:Is the setting for "FTP: connection details" -> "Advanced" -> "Encoding of file names" the same in both 64 and 32 bit version?
I'm using 32- and 64-bit versions of TC with the same INI files, so they are the same. Though it shouldn't matter much, because (aside from what Christian said) this setting is different for each FTP connection and therefore cannot affect download list functionality which is connection-independent.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I'm sorry, I cannot reproduce the diacritics problem. I will have to test it on Russian locale.
Author of Total Commander
https://www.ghisler.com
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

Sorry for the delay, I've finally tested in a virtual machine with WinXP x64, and the problem is reproduced there too (with TC 64-bit).

When I switched the "Language for non-Unicode programs" option to "German (Switzerland)", the problem disappeared, and all the filename characters I tried were saved correctly (I tested cyrillic, western diacritics and some Polich characters not present in the western codepage). So, this problem indeed is reproduced with Russian locale only.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Thanks for the additional tests!
Author of Total Commander
https://www.ghisler.com
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

Tested in 8.50β2a: confirm fixed both problems.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Thanks!
Author of Total Commander
https://www.ghisler.com
Post Reply