Bug in Cyrillic UTF-8
Moderators: Hacker, petermad, Stefan2, white
Bug in Cyrillic UTF-8
TC x32 8.00b16
Win7SP1 x64 Eng
Image: http://i30.fastpic.ru/big/2012/0116/c7/9255b4fbac02fe498020cf99c65488c7.png
First document on screen saved in ANSI displayed proper, second - in UTF-8.
Win7SP1 x64 Eng
Image: http://i30.fastpic.ru/big/2012/0116/c7/9255b4fbac02fe498020cf99c65488c7.png
First document on screen saved in ANSI displayed proper, second - in UTF-8.
Last edited by LonerD on 2012-01-20, 16:43 UTC, edited 1 time in total.
"I used to feel guilty in Cambridge that I spent all day playing games, while I was supposed to be doing mathematics. Then, when I discovered surreal numbers, I realized that playing games IS math." John Horton Conway
- ghisler(Author)
- Site Admin
- Posts: 50532
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Does the UTF-8 file have a byte order marker (BOM)?
You can check it by viewing the file with F3. Then press '1' to see the plain text.
You can check it by viewing the file with F3. Then press '1' to see the plain text.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
When I open file in Lister:
1 - Plain Text - wrong symbols
7 - UTF-8 - normal letters
1 - Plain Text - wrong symbols
7 - UTF-8 - normal letters
"I used to feel guilty in Cambridge that I spent all day playing games, while I was supposed to be doing mathematics. Then, when I discovered surreal numbers, I realized that playing games IS math." John Horton Conway
- sqa_wizard
- Power Member
- Posts: 3893
- Joined: 2003-02-06, 11:41 UTC
- Location: Germany
Well, to ask more detailed:
1. view the file with F3
2. press '1' to see the plain text
3. Have a look at the first 3 characters
Do they look like this?
If yes, this is the BOM, which marks this file clearly as "UTF-8"
1. view the file with F3
2. press '1' to see the plain text
3. Have a look at the first 3 characters
Do they look like this?
Code: Select all

#5767 Personal license
- ghisler(Author)
- Site Admin
- Posts: 50532
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
You no longer need to check it, I will add support for both types (with or without BOM) to the next beta.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
ghisler(Author)
No changes in beta17 ?
http://i28.fastpic.ru/big/2012/0120/39/7051875e18bae325fcee2366b3d02239.png
No changes in beta17 ?
in beta17 it look:Do they look like this?
http://i28.fastpic.ru/big/2012/0120/39/7051875e18bae325fcee2366b3d02239.png
"I used to feel guilty in Cambridge that I spent all day playing games, while I was supposed to be doing mathematics. Then, when I discovered surreal numbers, I realized that playing games IS math." John Horton Conway
- ghisler(Author)
- Site Admin
- Posts: 50532
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Actually UTF-8 is now suppored by the internal text to thumbnail converter! I guess that you have some Lister plugin installed which does the conversion instead of TC, and does it wrong.No changes in beta17 ?
Please try this: Go to menu Configuration - Options - Thumbnails, and turn off all methods except for the last (text preview). Then switch thumbs view on. You may need to right click on the thumb and choose to re-load it.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Oh, it's my mistake.in beta17 it look:
Issue has been resolved.
In beta 17-17a all shown right.
Thanks.
"I used to feel guilty in Cambridge that I spent all day playing games, while I was supposed to be doing mathematics. Then, when I discovered surreal numbers, I realized that playing games IS math." John Horton Conway
- ghisler(Author)
- Site Admin
- Posts: 50532
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Sure, it's quite simple, see function IsBufferUtf8 below. PartialAllowed must be set to true if the buffer is smaller than the file.
Code: Select all
const bytesFromUTF8:array[char] of byte = (
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 32
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 64
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, // 96
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //128
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //160
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, //192
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, //224
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5); //256
function GetUtf8CharWidth(firstchar:char):integer;
begin
result:=bytesFromUTF8[firstchar]+1;
end;
function IsFirstUTF8Char(thechar:char):boolean;
{The remaining bytes in a multi-byte sequence have 10 as their two most significant bits.}
begin
result:=(byte(thechar) and (128+64))<>128;
end;
function IsSecondaryUTF8Char(thechar:char):boolean;
{The remaining bytes in a multi-byte sequence have 10 as their two most significant bits.}
begin
result:=(byte(thechar) and (128+64))=128;
end;
function IsBufferUtf8(buf:pchar;PartialAllowed:boolean):boolean;
{Buffer contains only valid UTF-8 characters, no secondary alone, no primary without the correct nr of secondary}
var p:pchar;
utf8bytes:integer;
hadutf8bytes:boolean;
begin
p:=buf;
hadutf8bytes:=false;
result:=false;
utf8bytes:=0;
while p[0]<>#0 do begin
if utf8bytes>0 then begin {Expecting secondary char}
hadutf8bytes:=true;
if not IsSecondaryUTF8Char(p[0]) then exit; {Fail!}
dec(utf8bytes);
end else if IsFirstUTF8Char(p[0]) then
utf8bytes:=GetUtf8CharWidth(p[0])-1
else if IsSecondaryUTF8Char(p[0]) then
exit; {Fail!}
inc(p);
end;
result:=hadutf8bytes and (PartialAllowed or (utf8bytes=0));
end;
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com