ghisler(Author) wrote:How should TC know that a file is in UTF8 format? If it contains only English text, it looks the same as ANSI text...
Hi Christian,
If the file have not UTF-8 signature, TC can't know that it should read it in UTF-8.
But I would like to have a special option to open the text files (files that TC opens in "text only" format by default) with UTF-8 format by default.
I am french and I use UTF-8 format by default for all my text files (but without using UTF-8 signature) with PSPad free text editor ( http://www.pspad.com/ ).
It will give no changes for English text but it is not the case for French text...
What about adding also 8 for shortcut to UTF-8 format in Lister ?
It is possible that i have not well understood your reply.
Does my request to have a special new option to read the text file using the UTF-8 format by default is stupid ?
Will it give a problem in some other view ?
ghisler(Author) wrote:How should TC know that a file is in UTF8 format? If it contains only English text, it looks the same as ANSI text...
For English text, it looks the same, so it also doesn't matter whether the lister is switched to ANSI or UTF-8
For non-English text, however, it should be possible to "guess" the format even when there's no signature in the file (for example, verify that all bytes >= 0x80 fall into valid UTF-8 sequences... maybe it would be good enough?).
Currently lister doesn't scan the entire file when loading it, so such a check would take a long time with big files. On the other side, scanning only a small part of the file could lead to incorrect results.
Hi Christian, you have not replied to my request to have a special option to use UTF-8 format has default for text files without doing a scan of the file ?
ghisler(Author) wrote:Currently lister doesn't scan the entire file when loading it, so such a check would take a long time with big files. On the other side, scanning only a small part of the file could lead to incorrect results.
Right, scanning of the whole file is not a good idea if the file is really big - but I think that a smaller block (32kB?) can give quite a reliable result (if the number of 0x80+ characters exceeds certain limit, of course; the format of UTF-8 sequences is quite special).
Maybe this "text format auto-detection" could be an optional feature (enable/disabled in Lister options). Detecting Unicode files (without BOF signature) should be possible in a similar way.
ghisler(Author) wrote:Currently lister doesn't scan the entire file when loading it, so such a check would take a long time with big files. On the other side, scanning only a small part of the file could lead to incorrect results.
Well, one could argue that not scanning only a small part never leads to the correct result. I think people would rather the lister at least tried to make an educated guess, based on a small part of the file, than that it did nothing.
I can send you a program I wrote to determine the encoding of files at work (based on an algorithm found in the Unix utility "file"). It's written in Ruby, but it should be easy enough to follow even if you're not familiar with the language.
tommy0910 wrote: 2020-06-23, 21:15 UTC
Is there any news on this?
I don't need any autodetection. I'd just like lister to always start in mode "7" instead of mode "1".
Install the CudaLister plugin.
You can set UTF-8 as default for opening files and it also has many advantages compared to pure Lister.
The options are reached by the context menu in any open file. https://totalcmd.net/plugring/CudaLister.html
Windows 11 Home, Version 24H2 (OS Build 26100.3915) TC 11.51 x64 / x86
Everything 1.5.0.1391a (x64), Everything Toolbar 1.5.2.0, Listary Pro 6.3.2.88
QAP 11.6.4.2.1 x64