Page 1 of 1

UTF-8 for history.txt

Posted: 2024-02-03, 16:22 UTC
by browny
All was good while history was an ASCII text, but now the file contains funny characters, and the correct encoding is unknown.
Therefore, it would be beneficial to switch to UTF-8 and add corresponding BOM.
Other kinds of Unicode will considerably increase the size.

Re: UTF-8 for history.txt

Posted: 2024-02-03, 16:37 UTC
by Horst.Epp
It's not an ASCII text, it's ANSI, which displays fine with internal viewer,
many other popular Text file viewers like CudaLister or AkelPad
and the very useful TC Changes Viewer.

Re: UTF-8 for history.txt

Posted: 2024-02-03, 16:39 UTC
by browny
What sort of ANSI? What code page? Do you know that ANSI for Asian languages is double byte?

Re: UTF-8 for history.txt

Posted: 2024-02-03, 16:52 UTC
by AntonyD
To be honest, yes - it doesn’t matter at all - what encoding is, what is the correct name,
but this file really should have been saved long ago either in UTF-8 + BOM, or in UTF-16 LE + BOM (as INI)

Re: UTF-8 for history.txt

Posted: 2024-02-04, 03:09 UTC
by petermad
What sort of ANSI? What code page?
Code page 1252 Western Latin1

Note, that in the history.txt file from TC 11.03rc1 and TC 11.03rc3 a few characters (äö‹›«»Äß) was not saved correctly.
but now the file contains funny characters
Some of these characters from the extended ANSI charset has been there since 2008 and all of them since 2016.

Re: UTF-8 for history.txt

Posted: 2024-02-04, 09:17 UTC
by AntonyD
Code page 1252 Western Latin1
depends on regional settings. For me it's 1251 ;) in all editors....
That's why I strongly vote for either saving file in UTF-8 + BOM, or in UTF-16 LE + BOM

Re: UTF-8 for history.txt

Posted: 2024-02-04, 10:28 UTC
by browny
The line to look at was:
29.01.24 Fixed: Regular expressions in file names: Support Unicode accents in constructs like \bфndern\b which will find the whole word "фndern" (change) in a file name (32/64)

Re: UTF-8 for history.txt

Posted: 2024-02-04, 10:58 UTC
by Horst.Epp
browny wrote: 2024-02-04, 10:28 UTC The line to look at was:
29.01.24 Fixed: Regular expressions in file names: Support Unicode accents in constructs like \bфndern\b which will find the whole word "фndern" (change) in a file name (32/64)
Looks fine

Code: Select all

29.01.24 Fixed: Regular expressions in file names: Support Unicode accents in constructs like \bändern\b which will find the whole word "ändern" (change) in a file name (32/64)

Re: UTF-8 for history.txt

Posted: 2024-02-04, 11:15 UTC
by AntonyD
Looks fine
again - depends on regional settings! For RU settings for example this file will be treated as Cyrillic Windows 1251.
By default as in the Lister so and in all available Editors here...
Which will lead to the view like that was shown above: "\bдndern\b". BUT If I will use Encodings menu in the Lister
and choose "ASCII\DOS (local codepage) 1" - it will look like that: "\bфndern\b".
AND ONLY if I will choose 1252 - I will see what you see. BUT here I MUST BEFOREHAND KNOW ABOUT this fact!
Therefore, I repeat that in order to prevent misunderstandings about which line/word/character in this file
we can talk about - this file really should have been saved long ago either in UTF-8 + BOM or in UTF-16 LE + BOM (as INI )

Re: UTF-8 for history.txt

Posted: 2024-02-04, 11:59 UTC
by Horst.Epp
2AntonyD
I agree with you, but why should one use "ASCII\DOS (local codepage) 1" for Windows programs ?

Re: UTF-8 for history.txt

Posted: 2024-02-04, 12:25 UTC
by browny
Character graphics still could be seen; and not only in old files.

Re: UTF-8 for history.txt

Posted: 2024-02-04, 14:48 UTC
by petermad
2AntonyD
AND ONLY if I will choose 1252 - I will see what you see.
I also see "ändern" with codepage 1250, 1254, 1257 and 1258 when using the default fixedsys (western) font in Lister.

Re: UTF-8 for history.txt

Posted: 2024-02-05, 08:21 UTC
by ghisler(Author)
I think that it's a good idea (UTF-8 with by order mark), especially since the new Windows notepad app is such a crappy implementation that it doesn't even correctly recognize ansi text, although it's very easy to find invalid UTF-8 codes.

Re: UTF-8 for history.txt

Posted: 2024-02-06, 15:40 UTC
by white

Moderator message from: white » 2024-02-06, 15:39 UTC


Re: UTF-8 for history.txt

Posted: 2024-02-08, 20:15 UTC
by browny
Not only Notepad. The issue was present in Lister too, if the default was non-Latin CP.
Thanks, fixed in RC5 History.txt.