Endless loop with wdx Unicode fulltext search and RegEx

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Endless loop with wdx Unicode fulltext search and RegEx

Post by *milo1012 »

I just found a strange bug with the (new to TC 9.0) wdx unicode fulltext search and a RegEx with multiple quantifiers (TC help speaks of iterators, which is the wrong term BTW).
The only (available) plug-ins I'm aware of which are capable of the new unicode fulltext search are:

Lefteous' xPDFSearch beta:
http://www.ghisler.ch/board/viewtopic.php?p=309435#309435
and my own:
https://totalcmd.net/plugring/PCREsearch.html
https://totalcmd.net/plugring/APK-wdx.html

Only my own have problems when using a RegEx with multiple quantifiers, e.g. search content with a RegEx:

Code: Select all

x\s+y\s+z
with the fulltext search of these plug-ins activated.
TC will be stuck in an endless loop.

So I guess it is once again related to the fact that these plug-ins use the maximum buffer size, which already was a problem in the past:
http://www.ghisler.ch/board/viewtopic.php?p=313467#313467

Tested with TC 9.12, bug present in both 32-bit and 64-bit version.
TC plugins: PCREsearch and RegXtract
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Please provide sample text and detailed instructions to reproduce. Thanks.
Author of Total Commander
https://www.ghisler.com
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

First of all, I could narrow it down a bit:
It seems to be triggered by a certain amount of text that is searched by TC with wdx fulltext and RegEx, or accordingly a certain amount of files in a location which you want to search for text. The amount of quantifiers actually doesn't matter.

It wasn't easy to provide a reproducible sample, but here's a a sample collection of office/document files which seems to be enough for a test set:
https://docs.python.org/3/archives/python-3.6.4rc1-docs-pdf-letter.zip
(from https://docs.python.org/3/download.html )
Download it and unpack it to a specific location.
Make PCREsearch with the Oracle OiT filters working. Instructions in readme, or more specific e.g.
http://www.ghisler.ch/board/viewtopic.php?p=333189#333189

Now go to that dir where you extracted the sample docs and simply use TC's search function with the plug-in's fulltext search with RegEx enabled, i.e. enable
[x] RegEx (2)
and search for e.g.

Code: Select all

x\s+y
or

Code: Select all

x\sy
TC seems to read the text provided by the plug-in and already finds some files containing the text:

Code: Select all

howto-argparse.pdf
howto-clinic.pdf
but afterwards TC will be stuck in an endless loop (one thread maxed out it's CPU load):

Code: Select all

ntoskrnl.exe!memset+0x61a
ntoskrnl.exe!KeWaitForMultipleObjects+0xd52
ntoskrnl.exe!KeWaitForSingleObject+0x19f
ntoskrnl.exe!PoStartNextPowerIrp+0xbd0
ntoskrnl.exe!PoStartNextPowerIrp+0x186d
ntoskrnl.exe!KiCheckForKernelApcDelivery+0x25
Ntfs.sys+0xa3d0e
fltmgr.sys+0x283f
fltmgr.sys+0x16df
ntoskrnl.exe!ObOpenObjectByName+0xc3e
ntoskrnl.exe!ObfDereferenceObject+0xd4
ntoskrnl.exe!MmCreateSection+0x3641
ntoskrnl.exe!SeQueryInformationToken+0xe3e
ntoskrnl.exe!ObOpenObjectByName+0x306
ntoskrnl.exe!NtOpenProcessTokenEx+0x326
ntoskrnl.exe!KeSynchronizeExecution+0x3a23
ntdll.dll!ZwQueryAttributesFile+0xa
KERNELBASE.dll!GetFileAttributesW+0x78
PCREsearch.wdx64+0x1f1f
PCREsearch.wdx64!ContentCompareFilesW+0x4666
PCREsearch.wdx64!ContentGetValueW+0x31
TOTALCMD64.EXE+0x548952

(TC 9.12 x64)
Simply disabling
[] RegEx (2)
and using an appropriate normal search term, e.g.

Code: Select all

x y
will work all the time, I never had any problems with it.

So, all in all, it seems to me that TC's RegEx engine for UTF-16 has some serious problems when using the unicode version of the fulltext search.

And the specific behavior is not unique to PCREsearch. If you use my APK-wdx plug-in fulltext search on a location with a larger amount of apk files (in my case > 50 files), the same bug appears with RegEx (but it's not that easy to provide you with a sample set of free/legal apk files).
TC plugins: PCREsearch and RegXtract
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

This should be fixed in TC 9.20 beta 1, please test it!
Author of Total Commander
https://www.ghisler.com
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

I can confirm that this is now working w/o any problems in TC 9.20 beta 1, both 32- and 64-bit. Thanks!

(I can't find a dedicated entry for this in the history.txt file, though)
TC plugins: PCREsearch and RegXtract
Post Reply