[SOLVED] Use UTF-8 with byte-order mark when creating a new .txt file

English support forum

Moderators: white, Hacker, petermad, Stefan2

User avatar
tuska
Power Member
Power Member
Posts: 3760
Joined: 2007-05-21, 12:17 UTC

[SOLVED] Use UTF-8 with byte-order mark when creating a new .txt file

Post by *tuska »

I hope I'm not asking a stupid question here.

Here the encoding "UTF-8 with byte-order mark" is preferred.

When I create a .TXT file in Total Commander with Shift+F4, it has the encoding "ANSI".
In such a case, should the encoding "UTF-8 with byte-order mark" perhaps also be used in future?
white wrote:2024-02-06, 15:38 UTC Split from topic UTF-8 for history.txt
Last edited by tuska on 2024-02-12, 18:32 UTC, edited 1 time in total.
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6498
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: UTF-8 for history.txt

Post by *Horst.Epp »

tuska wrote: 2024-02-05, 17:21 UTC When I create a .TXT file in Total Commander with Shift+F4, it has the encoding "ANSI".
In such a case, should the encoding "UTF-8 with byte order mark" perhaps also be used in future?
That's a function of the called editor.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1373a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
User avatar
tuska
Power Member
Power Member
Posts: 3760
Joined: 2007-05-21, 12:17 UTC

Re: UTF-8 for history.txt

Post by *tuska »

Horst.Epp wrote: 2024-02-05, 17:24 UTC
tuska wrote: 2024-02-05, 17:21 UTC When I create a .TXT file in Total Commander with Shift+F4, it has the encoding "ANSI".
In such a case, should the encoding "UTF-8 with byte order mark" perhaps also be used in future?
That's a function of the called editor.
Thanks!
User avatar
tuska
Power Member
Power Member
Posts: 3760
Joined: 2007-05-21, 12:17 UTC

Re: UTF-8 for history.txt

Post by *tuska »

Horst.Epp wrote: 2024-02-05, 17:24 UTC
tuska wrote: 2024-02-05, 17:21 UTC When I create a .TXT file in Total Commander with Shift+F4, it has the encoding "ANSI".
In such a case, should the encoding "UTF-8 with byte order mark" perhaps also be used in future?
That's a function of the called editor.
In my case, however, the coding should be determined by Total Commander.

EmEditor Professional (64-bit) Version 23.1.901
New Files...: Encoding: UTF-16LE
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6498
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: UTF-8 for history.txt

Post by *Horst.Epp »

tuska wrote: 2024-02-05, 17:34 UTC In my case, however, the coding should be determined by Total Commander.
I don't expect that TC knows the syntax of any other Editor than the default Windows Notepad.

I have redirected NotePad on the OS level to my preferred editor.
This way I can control what the default file format is.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1373a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: UTF-8 for history.txt

Post by *AntonyD »

The encoding will be processed as UTF-8 or UTF-16 LE - with ease, ONLY if it contains the BOM signature.
And of course, these 3 bytes must be written to a new file using the tool that calls the API function
CreateFileEx. In our case, this is Total Commander.
ANY editor that we can configure as an external editor will only OPEN this file and TRY to determine
file encoding. AND IF the file is size 0 (TC does just that by default) - it will ONLY treat it as ANSI.

So yes - it would be a very significant step forward 0 if Total (ok, using some new option) - wrote
to a new file, that was created using Shift+F4, the first 3 bytes as the BOM signature....
#146217 personal license
User avatar
petermad
Power Member
Power Member
Posts: 14809
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: UTF-8 for history.txt

Post by *petermad »

When I create a .TXT file in Total Commander with Shift+F4, it has the encoding "ANSI".
That depends on your Windows version. In Windows XP, 7 and 8.1 it is ANSI, in Windows 10 it is UTF-8 (without BOM) that is the default (if there are non-English characters in the document).

I don't have Windows 11 (only about 24% does), but I think I read somewhere that it uses UTF-8 with BOM.

How to change the default encoding in Notepad: https://www.thewindowsclub.com/how-to-change-the-default-character-encoding-in-notepad-on-windows-10
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

Re: UTF-8 for history.txt

Post by *browny »

CreateFile (of any subkind: Ex, 2, A, W) is a general file opening function and has no idea about file contents.
ASCII file is a valid UTF-8 file without BOM. So it makes sense to write BOM only if text is non-empty.

There is no reason for TC to know encoding if it simply passed file name to an external program.

Should anybody prefer UTF-8? Depends on the environment.
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: UTF-8 for history.txt

Post by *AntonyD »

if it simply passed file name
That's the point! NO, it does not! It creates the file! And only after that - calls external editor and passes the full filepath as a argument to it.
So to be sure that this editor can easily treat the input file as a proper text file with predefined encoding UTF-8 (or UTF-16 LE)
we must beforehand to supply this new file template with a proper BOM signature, otherwise this file will be treated as ANSI.
And YES - IF we want to always receive ANSI files and only fine-tune the encoding of the open file during the saving process
in the open editor - then ok, we definitely can do not perform this preliminary action: the adding of BOM.
BUT if it would be an option - then it can save my nerves and a time - so I can be sure that immediately as text file will be opened
in my preferable editor - AND it will be after all saved with my preferable encoding.
#146217 personal license
User avatar
tuska
Power Member
Power Member
Posts: 3760
Joined: 2007-05-21, 12:17 UTC

Re: UTF-8 for history.txt

Post by *tuska »

petermad wrote: 2024-02-06, 05:28 UTC
When I create a .TXT file in Total Commander with Shift+F4, it has the encoding "ANSI".
That depends on your Windows version. In Windows XP, 7 and 8.1 it is ANSI, in Windows 10 it is UTF-8 (without BOM)
that is the default (if there are non-English characters in the document).
I don't have Windows 11 (only about 24% does), but I think I read somewhere that it uses UTF-8 with BOM.
If a .txt file is created in Total Commander with Shift+F4,
then the editors "EmEditor" and "Notepad++", for example, use "ANSI" as the encoding.
The editors "Notepad3" and "PSPad" use the encoding set in the editor.
Windows 11-Editor (Notepad) uses encoding: UTF-8.

Configuration > Edit/View > Editor for F4
  1. EmEditor Professional (64-bit) Version 23.1.901
    C:\Users\user\AppData\Local\Programs\EmEditor\EmEditor.exe /cd "%1"
    Encoding: ANSI although the default encoding: UTF-16LE with Signature is set.
     
  2. Notepad++ v8.6.2 (64-bit) - 14.1.2024
    %COMMANDER_PATH%\Tools\Notepad++\notepad++.exe "%1"
    Encoding: ANSI although the default encoding: UTF-16 Little Endian with BOM is set.
     
  3. Notepad3 (x64) 6.23.203.2 - 3.2.2023
    %COMMANDER_PATH%\Tools\Notepad3\Notepad3.exe "%1"
    - Notepad3: Unicode (UTF-8) (default) ... Notepad3 uses the encoding that has been set.
    - EmEditor, Notepad++ show: ANSI
     
  4. PSPad freeware editor (32b) - 5.0.7 (775) 18.3.2023
    %COMMANDER_PATH%\Tools\PSPadEditor\PSPad.exe "%1"
    Encoding: ANSI Western European (1252) ... only if the default encoding has been set to this value
    ; (Menu "Coding" > Ctrl+click on the desired coding to set the "default code page" with an asterisk).
     
  5. Windows-Editor 11.2312.18.0
    C:\Program Files\WindowsApps\Microsoft.WindowsNotepad_11.2312.18.0_x64__8wekyb3d8bbwe\Notepad\Notepad.exe
    - Windows-Editor (Notepad) Encoding: UTF-8
    - EmEditor, Notepad++ show: ANSI

Windows 11 Pro (x64) Version 23H2 (OS Build 22631.3085) | TC 11.03RC4 x64
User avatar
petermad
Power Member
Power Member
Posts: 14809
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: UTF-8 for history.txt

Post by *petermad »

If a .txt file is created in Total Commander with Shift+F4,
then the editors "EmEditor" and "Notepad++", for example, use "ANSI" as the encoding.
The editors "Notepad3" and "PSPad" use the encoding set in the editor.
Windows 11-Editor (Notepad) uses encoding: UTF-8.
Sorry I should have mentioned that what I wrote applies to Notepad (Notepad is Windows version dependent)
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
tuska
Power Member
Power Member
Posts: 3760
Joined: 2007-05-21, 12:17 UTC

Re: UTF-8 for history.txt

Post by *tuska »

petermad wrote: 2024-02-06, 11:20 UTC
If a .txt file is created in Total Commander with Shift+F4,
then the editors "EmEditor" and "Notepad++", for example, use "ANSI" as the encoding.
The editors "Notepad3" and "PSPad" use the encoding set in the editor.
Windows 11-Editor (Notepad) uses encoding: UTF-8.
Sorry I should have mentioned that what I wrote applies to Notepad (Notepad is Windows version dependent)
No problem, I took it into account and wanted to compare a few editors anyway.
browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

Re: UTF-8 for history.txt

Post by *browny »

AntonyD wrote: 2024-02-06, 09:52 UTC we must beforehand to supply this new file template with a proper BOM signature, otherwise this file will be treated as ANSI.
An external editor, independently of TC, might be configured to create a new file in CP1250. Thanks for extra BOM then.
Or enjoy configuring in two places for such changes.
User avatar
Hacker
Moderator
Moderator
Posts: 13067
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: UTF-8 for history.txt

Post by *Hacker »

browny,
An external editor which ignores a BOM and treats a file as ANSI despite the BOM is not an editor I would recommend.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: UTF-8 for history.txt

Post by *AntonyD »

An external editor, independently of TC, might be configured to create a new file in CP1250
You did not get what I've wrote? INITIAL side - which in fact RESPONSIBLE for a new file appearance on a disk
IS the Total Commander itself! So when this external editor OPENS already existed file (but only with 0 bytes size)
it depends on its logic - HOW it will process such a file. And MOSTLY editors are not applying they settings concern
about the creation of a new file - only because such a "new" file WAS NOT created from inside of those editors!

Therefore, the presence of an option that will force Total to add a BOM mark to the beginning of an empty textual
file will be more appropriate.

P.S. Yes, of course, when saving such a (even empty!) file inside any "smart" editor, we can select the option - And
in which encoding this file should be saved. And in principle, this solves our problem - HOW to save the new file
that we need now and which we tried to create with SHIFT+F4. But such a new option will make our lives easier by
eliminating this extra worry about choosing the right encoding when saving.
#146217 personal license
Post Reply