Page 1 of 1

3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-01-26, 16:59 UTC
by Skif_off
But in version 2.91 it works fine: I checked several files, which use UTF-8 without BOM (also HTML files with charset=utf-8) and cp1251.

Android 4.4, if it matters, Android 8 too.

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-01-27, 14:56 UTC
by ghisler(Author)
Does the file contain any UTF-8 in the first 4 kBytes? If not, then TC only searches with the default encoding.

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-01-27, 15:28 UTC
by Skif_off
ghisler(Author) wrote: 2021-01-27, 14:56 UTCDoes the file contain any UTF-8 in the first 4 kBytes?
Yes.
The same files in the same directory and the same search word: 2.91 finds files, but 3.00-3.20 finds nothing. Looks like TC problem.

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-01, 14:27 UTC
by ghisler(Author)
It works fine with my files, so I need a test file to check it. Please do the following:
1. Make a copy of one of your files
2. Edit it and remove any senstive data
3. Check whether the error still occurs
4. If yes, please send me the sample and your search word to cghisler at gmail dot com.

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-01, 16:48 UTC
by Skif_off
I did some more checks (2.91 vs. 3.20) and I think I found the problem. Try to find any nonUS-ASCII word, which have position with offset >4 kBytes: 3.20 tries to search only in the first 4 kBytes.

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-03, 14:05 UTC
by ghisler(Author)
Yes, that's what I wrote above: If the first 4k do NOT contain any UTF-8 multi-byte characters at all, then TC assumes that the file is in ANSI format and not UTF-8. Maybe I should just search the buffer twice, once in ANSI and once in UTF-8 mode...

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-03, 15:30 UTC
by Skif_off
https://www.upload.ee/files/12836392/testutf8.zip.html
Both files are UTF-8 encoded (starting from the first bytes), both files contain the word "плагин", but
- utf8-1.txt - with offset 0x0463.
- utf8-2.txt - with offset 0x2245.
Try to find files with "плагин": TC 2.91 will find both files, but TC 3.20 only first file.

It looks like TC 3.x is looking for the desired text inside buffer (first 4k Bytes), but not in all file content.

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-08, 18:01 UTC
by ghisler(Author)
Could you please try it with v3.21beta1? You can get the beta either from the Play Store:
https://play.google.com/apps/testing/com.ghisler.android.TotalCommander
or download it directly here:
https://www.ghisler.com/android.htm#download

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-08, 18:55 UTC
by Skif_off
Seems to work now, thank you!

Re: 3.x+. Search text files by content: cannot find Cyrillic

Posted: 2021-02-09, 15:08 UTC
by ghisler(Author)
Great, thanks for your quick reply!