UTF-8 for history.txt

Here you can propose new features, make suggestions etc.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

UTF-8 for history.txt

Post by *browny »

All was good while history was an ASCII text, but now the file contains funny characters, and the correct encoding is unknown.
Therefore, it would be beneficial to switch to UTF-8 and add corresponding BOM.
Other kinds of Unicode will considerably increase the size.
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6495
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: UTF-8 for history.txt

Post by *Horst.Epp »

It's not an ASCII text, it's ANSI, which displays fine with internal viewer,
many other popular Text file viewers like CudaLister or AkelPad
and the very useful TC Changes Viewer.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1373a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

Re: UTF-8 for history.txt

Post by *browny »

What sort of ANSI? What code page? Do you know that ANSI for Asian languages is double byte?
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: UTF-8 for history.txt

Post by *AntonyD »

To be honest, yes - it doesn’t matter at all - what encoding is, what is the correct name,
but this file really should have been saved long ago either in UTF-8 + BOM, or in UTF-16 LE + BOM (as INI)
#146217 personal license
User avatar
petermad
Power Member
Power Member
Posts: 14808
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: UTF-8 for history.txt

Post by *petermad »

What sort of ANSI? What code page?
Code page 1252 Western Latin1

Note, that in the history.txt file from TC 11.03rc1 and TC 11.03rc3 a few characters (äö‹›«»Äß) was not saved correctly.
but now the file contains funny characters
Some of these characters from the extended ANSI charset has been there since 2008 and all of them since 2016.
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: UTF-8 for history.txt

Post by *AntonyD »

Code page 1252 Western Latin1
depends on regional settings. For me it's 1251 ;) in all editors....
That's why I strongly vote for either saving file in UTF-8 + BOM, or in UTF-16 LE + BOM
#146217 personal license
browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

Re: UTF-8 for history.txt

Post by *browny »

The line to look at was:
29.01.24 Fixed: Regular expressions in file names: Support Unicode accents in constructs like \bфndern\b which will find the whole word "фndern" (change) in a file name (32/64)
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6495
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: UTF-8 for history.txt

Post by *Horst.Epp »

browny wrote: 2024-02-04, 10:28 UTC The line to look at was:
29.01.24 Fixed: Regular expressions in file names: Support Unicode accents in constructs like \bфndern\b which will find the whole word "фndern" (change) in a file name (32/64)
Looks fine

Code: Select all

29.01.24 Fixed: Regular expressions in file names: Support Unicode accents in constructs like \bändern\b which will find the whole word "ändern" (change) in a file name (32/64)
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1373a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
User avatar
AntonyD
Power Member
Power Member
Posts: 1249
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: UTF-8 for history.txt

Post by *AntonyD »

Looks fine
again - depends on regional settings! For RU settings for example this file will be treated as Cyrillic Windows 1251.
By default as in the Lister so and in all available Editors here...
Which will lead to the view like that was shown above: "\bдndern\b". BUT If I will use Encodings menu in the Lister
and choose "ASCII\DOS (local codepage) 1" - it will look like that: "\bфndern\b".
AND ONLY if I will choose 1252 - I will see what you see. BUT here I MUST BEFOREHAND KNOW ABOUT this fact!
Therefore, I repeat that in order to prevent misunderstandings about which line/word/character in this file
we can talk about - this file really should have been saved long ago either in UTF-8 + BOM or in UTF-16 LE + BOM (as INI )
#146217 personal license
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6495
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: UTF-8 for history.txt

Post by *Horst.Epp »

2AntonyD
I agree with you, but why should one use "ASCII\DOS (local codepage) 1" for Windows programs ?
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1373a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.73
QAP 11.6.3.2 x64
browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

Re: UTF-8 for history.txt

Post by *browny »

Character graphics still could be seen; and not only in old files.
User avatar
petermad
Power Member
Power Member
Posts: 14808
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: UTF-8 for history.txt

Post by *petermad »

2AntonyD
AND ONLY if I will choose 1252 - I will see what you see.
I also see "ändern" with codepage 1250, 1254, 1257 and 1258 when using the default fixedsys (western) font in Lister.
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48088
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: UTF-8 for history.txt

Post by *ghisler(Author) »

I think that it's a good idea (UTF-8 with by order mark), especially since the new Windows notepad app is such a crappy implementation that it doesn't even correctly recognize ansi text, although it's very easy to find invalid UTF-8 codes.
Author of Total Commander
https://www.ghisler.com
User avatar
white
Power Member
Power Member
Posts: 4623
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: UTF-8 for history.txt

Post by *white »

Moderator message from: white » 2024-02-06, 15:39 UTC

browny
Senior Member
Senior Member
Posts: 288
Joined: 2007-09-10, 13:19 UTC

Re: UTF-8 for history.txt

Post by *browny »

Not only Notepad. The issue was present in Lister too, if the default was non-Latin CP.
Thanks, fixed in RC5 History.txt.
Post Reply