Incorrect case in UTF-8 search (TC and Lister)

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

Post Reply
Slavic
Senior Member
Senior Member
Posts: 290
Joined: 2006-02-26, 15:41 UTC
Location: Montenegro

Incorrect case in UTF-8 search (TC and Lister)

Post by *Slavic »

This bug can be reproduced using this simple example. Here are two Cyrillic words: [nova] (suffix and short form of adjective "new") and [noga] (the leg). The difference is in the 3rd letter. Because this forum isn't in UTF-8, I place here a plain text equivalent of this example (copy and save in ANSI mode, e.g as uni_test.txt):

Code: Select all

РЅРѕРІР°   // nova
РЅРѕРіР°   // noga
РЅРѕРІР°
РЅРѕРіР°
View this file in the Lister in UTF-8 mode, select the first word and copy to the search dialog. Then press Shift+F7. Select the second word, copy to the search dialog and also use Shift+F7. In both cases Lister shows all words as instances, but it's wrong!

Cause: by default, Lister and TC use case-insensitive search. However, in UTF-8 case this should be done only with original Unicode source and destination, not with their UTF-8 representations. In my example, the difference is in the characters "I" and "i" in the UTF-8 encoding: they shouldn't be counted as similar in any circumstances.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Indeed case-insensitive search in UTF8 only works for English texts at this time, sorry. I found no way to make it work with UTF8 because of the variable width of characters.
Author of Total Commander
https://www.ghisler.com
Slavic
Senior Member
Senior Member
Posts: 290
Joined: 2006-02-26, 15:41 UTC
Location: Montenegro

Post by *Slavic »

Well, I agree that making a case-insensitive search in UTF-8 is not an easy thing, because functions like stricmp() can only work with ANSI and binary Unicode.

Could it be better to enable for UTF-8 only the case-sensitive search: this solution can fix the bug I have found? In this case, if user selects the checkbox "UTF8", the checkbox "Case sensitive" will be selected automatically. For case-insensitive search in ANSI texts (English etc) user can always switch back to the simple text mode and clear this checkbox.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Could it be better to enable for UTF-8 only the case-sensitive search
I thought about that too, but then even English searches would be case-sensitive only...

A new function is planned, but I don't know when I can add it - my to do list is very very long, and is growing every day. :(
Author of Total Commander
https://www.ghisler.com
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

This _should_ work now for forward searches in UTF8 (not backwards yet). Can anyone test this, please?
Author of Total Commander
https://www.ghisler.com
User avatar
Flint
Power Member
Power Member
Posts: 3487
Joined: 2003-10-27, 09:25 UTC
Location: Antalya, Turkey
Contact:

Post by *Flint »

Yes, I performed some tests (not too thorough however) with this feature, it seems to work fine now.
Flint's Homepage: Full TC Russification Package, VirtualDisk, NTFS Links, NoClose Replacer, and other stuff!
 
Using TC 10.52 / Win10 x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48083
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Great, thanks!
Author of Total Commander
https://www.ghisler.com
Post Reply