3.x+. Search text files by content: cannot find Cyrillic
Moderators: Hacker, petermad, Stefan2, white
3.x+. Search text files by content: cannot find Cyrillic
But in version 2.91 it works fine: I checked several files, which use UTF-8 without BOM (also HTML files with charset=utf-8) and cp1251.
Android 4.4, if it matters, Android 8 too.
Android 4.4, if it matters, Android 8 too.
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: 3.x+. Search text files by content: cannot find Cyrillic
Does the file contain any UTF-8 in the first 4 kBytes? If not, then TC only searches with the default encoding.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: 3.x+. Search text files by content: cannot find Cyrillic
Yes.
The same files in the same directory and the same search word: 2.91 finds files, but 3.00-3.20 finds nothing. Looks like TC problem.
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: 3.x+. Search text files by content: cannot find Cyrillic
It works fine with my files, so I need a test file to check it. Please do the following:
1. Make a copy of one of your files
2. Edit it and remove any senstive data
3. Check whether the error still occurs
4. If yes, please send me the sample and your search word to cghisler at gmail dot com.
1. Make a copy of one of your files
2. Edit it and remove any senstive data
3. Check whether the error still occurs
4. If yes, please send me the sample and your search word to cghisler at gmail dot com.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: 3.x+. Search text files by content: cannot find Cyrillic
I did some more checks (2.91 vs. 3.20) and I think I found the problem. Try to find any nonUS-ASCII word, which have position with offset >4 kBytes: 3.20 tries to search only in the first 4 kBytes.
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: 3.x+. Search text files by content: cannot find Cyrillic
Yes, that's what I wrote above: If the first 4k do NOT contain any UTF-8 multi-byte characters at all, then TC assumes that the file is in ANSI format and not UTF-8. Maybe I should just search the buffer twice, once in ANSI and once in UTF-8 mode...
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: 3.x+. Search text files by content: cannot find Cyrillic
https://www.upload.ee/files/12836392/testutf8.zip.html
Both files are UTF-8 encoded (starting from the first bytes), both files contain the word "плагин", but
- utf8-1.txt - with offset 0x0463.
- utf8-2.txt - with offset 0x2245.
Try to find files with "плагин": TC 2.91 will find both files, but TC 3.20 only first file.
It looks like TC 3.x is looking for the desired text inside buffer (first 4k Bytes), but not in all file content.
Both files are UTF-8 encoded (starting from the first bytes), both files contain the word "плагин", but
- utf8-1.txt - with offset 0x0463.
- utf8-2.txt - with offset 0x2245.
Try to find files with "плагин": TC 2.91 will find both files, but TC 3.20 only first file.
It looks like TC 3.x is looking for the desired text inside buffer (first 4k Bytes), but not in all file content.
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: 3.x+. Search text files by content: cannot find Cyrillic
Could you please try it with v3.21beta1? You can get the beta either from the Play Store:
https://play.google.com/apps/testing/com.ghisler.android.TotalCommander
or download it directly here:
https://www.ghisler.com/android.htm#download
https://play.google.com/apps/testing/com.ghisler.android.TotalCommander
or download it directly here:
https://www.ghisler.com/android.htm#download
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: 3.x+. Search text files by content: cannot find Cyrillic
Seems to work now, thank you!
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: 3.x+. Search text files by content: cannot find Cyrillic
Great, thanks for your quick reply!
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com