[WDX] PCREsearch

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

User avatar
MarkFilipak
Member
Member
Posts: 164
Joined: 2008-09-28, 01:00 UTC
Location: Mansfield, Ohio

Re: [WDX] PCREsearch

Post by *MarkFilipak »

Hello, Michael,

I'm eager to start using PCREsearch. I installed it (via a click in TC -- easy!). I've tried using it and have failed.

This:
\x00\x00\x01\xB5
is an MPEG sequence_extension in a DVD's MPEG PES (packetized elementary stream).
This:
\x00\x00\x01\xB5\x14\x82
is an MPEG sequence_extension that indicates profile_and_level_indication==48 (i.e. Main@Main) and chroma_format==1 (i.e. 4:2:0 chrominance subsampling).
This:
\x00\x00\x01\xB5([\x11-\x13].|\x14[^\x82]|[\x15-\x1F].)
is the pattern for an MPEG sequence_extension with profile_and_level_indication!=48 or chroma_format!=1.
There should be no such thing, but I'm finding that, in some DVDs, there is! Now I'm trying to search 100s of DVDs to see how many have this strangeness and to figure out why.

Where do I put \x00\x00\x01\xB5([\x11-\x13].|\x14[^\x82]|[\x15-\x1F].) to do the search in the 'Find Files' dialog? I've tried everything.

I hope I don't have to put the pattern into a file and then restart TC (or flush the plug-ins). I've read that, but I'm hoping that new versions don't have this limitation.

Regards,
Mark.

BTW, I did look at Everything. I couldn't figure it out and Karl didn't seem very keen to help.
Hi Christian! Delighted customer since 1999. License #37627
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 »

MarkFilipak wrote: 2020-10-15, 07:19 UTC I hope I don't have to put the pattern into a file and then restart TC (or flush the plug-ins). I've read that, but I'm hoping that new versions don't have this limitation.
Unfortunately not, as this is the nature of the WDX/content interface and due to the fact that we still don't have a TC built-in configuration option for content plugins.
So yeah, whatever you put in the "'Find Files" dialog will tell TC to just search IN whatever string/number/boolean value the plug-in returns. So you need to tell the plug-in which RegEx it should use for a specific field, and after that tell TC to search within that (now configured) plug-in field.

But you don't need to put it in the plug-ins ini file manually. After you've installed the plug-in, you can use the config tool "PCREsearchConfig.exe".
Open it. Now the quickest way for your specific task:
  • choose the field "My search term" in the left list (this is a default field already shipping with the plugin installation)
  • enter your Expression/RegEx in the top field
  • make sure the "Boolean" option is still set in the "Field type" setting
  • to make the field available to TC, make sure the "Fields" dropdown list (upper left) is set to 13 or higher (default is 10, you can identify the "activated" fields by the prefixed "-->" in the field list)
  • save and close PCREsearchConfig (apply/ok button)
  • use "cm_UnloadPlugins" command inside TC, e.g. by typing it in the command line (I've put the command in a button in the button bar)
  • open search dialog
  • set whatever search location and different search options you need in the main tab
  • switch to the "Plugins" tab
  • choose "pcresearch" -> "My search term" -> "=" -> "yes"
  • start search: TC should now find all (*.vob) files which contain your expression
So if you regularly work with the plugin, it's probably best to save the search settings to a search preset and load it when needed.
To make things quick and easy for me, I have three buttons in the button bar:
- link to "PCREsearchConfig.exe"
- command "cm_UnloadPlugins"
- button for search with my plugin preset:

Code: Select all

TOTALCMD#BAR#DATA
%COMMANDER_EXE%
/O /S=F:L"_pcresearch_myterm" "%P;%T"
wcmicons.dll,47
Start search with "PCREsearch" and "My search term" field preset

0
-1
Yes, it's quite cumbersome, but it works.

And BTW: you could also change the field type to a count, i.e. return the number of occurences of your "non-standard" sequence_extension, by changing the field's field type option to "Count" and search in TC for "field > 0".

Or return the hex locations of the occurrences, by using custom columns and changing the "Replace String" in the plug-ins config for that field accordingly (though searching in vob files will probably be quite slow for custom columns).


Edit: forgot the "Fields" option to activate the field
Last edited by milo1012 on 2020-10-17, 17:00 UTC, edited 1 time in total.
TC plugins: PCREsearch and RegXtract
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

Hello,
out of interest I tried to use this step-by-step guide and thought I had done it.

As shown in this picture, I received unexpected results and despite many attempts I do not know what to do.
I should also mention that I have very little knowledge of RegEx.

Therefore I'd like to ask you for your support.
milo1012 wrote: 2020-10-15, 11:55 UTCchoose "pcresearch" -> "My search term" -> "=" -> "yes"
With me I only have other choices (see picture), but not "yes". Is that OK?
Here is the complete content of PCREsearch.ini

Code: Select all

; INI-file for PCREsearch - should always be UTF-8!


[PCREsearch]
regexcount=10
regex1=.*
regex2=(?m)$
regex3=.*
regex4=\A(?:.*\R){1}(.*)\R?
regex5=[a-z0-9]{2,}
regex6=(\p{L}[\p{L}\p{Pd}']*[\p{L}']|\p{L})(\p{Pd}\R)?(\p{L}[\p{L}\p{Pd}']*[\p{L}']|\p{L})?
regex7=<a[^>]+href\s*=\s*['"]([^"']+)['"]
regex8=(?-i)[a-zA-Z0-9]
regex9=[^\s]+
regex10=.*
regex11=(?s).{1,4}
regex12=(?s).{1,4}
regex13=\Q\.tlb\E
regex14=[\p{L}'’\p{Pd}]+
regex15=(?s-i)^.{0,1024}JFIF
regex16=^[^\x20-\x7e]++$|[\x20-\x7e]

; names for expressions, only used if ModFieldName enabled or for LogErrors,
; Unicode characters converted to ANSI, '.'  '|'  ':' will be replaced with '_'
regex1name=My search term
regex2name=Line count
regex3name=First Line
regex4name=Second Line
regex5name=File Header Filter
regex6name=Basic Word Filter output
regex7name=HTML URL
regex8name=Random String
regex9name=Compare files - ignore whitespace
regex10name=Compare files - encoding comparison
regex11name=Quick-and-dirty Entropy reference
regex12name=Quick-and-dirty Entropy
regex13name=My search term
regex14name=Basic Word count
regex15name=Jpg Header
regex16name=Make filename ASCII

; the expressions types  -1: just encoding check (no search), 0: boolean, 1: count results, 
; 2: count individual results, 3: string output, 4: first match string output only, 5: random string output
; 6: average result length (in bytes), 7: file compare (Sync dirs)
regex1type=-1
regex2type=1
regex3type=4
regex4type=4
regex5type=3
regex6type=3
regex7type=3
regex8type=5
regex9type=7
regex10type=7
regex11type=1
regex12type=2
regex13type=0
regex14type=1
regex15type=0
regex16type=3

; replacement scheme, used only for string output types (ignored for random string)
regex1replace=$0\x20
regex2replace=$0\x20
regex3replace=$0
regex4replace=$1
regex5replace=$0\x20
regex6replace=$1$3\x20
regex7replace=$1\x20
regex8replace=$0\x20
regex9replace=$0\x20
regex10replace=$0\x20
regex11replace=$0\x20
regex12replace=$0\x20
regex13replace=$0\x20
regex14replace=$0\x20
regex15replace=$0\x20
regex16replace=$0

; field flags (options) - sum of:  1: use OEM(DOS) code page, 2: disable Unicode properties,
; 4: allow empty matches to pass through, 8: restrict reading to encoding check buffer (AnalyzeBuffer),
; 16: disable text filter/converter (xdoc2txt), 32: file compare case insensitive,
; 64: search in filename only
regex1flags=0
regex2flags=4
regex3flags=4
regex4flags=4
regex5flags=26
regex6flags=0
regex7flags=0
regex8flags=0
regex9flags=0
regex10flags=0
regex11flags=0
regex12flags=0
regex13flags=0
regex14flags=0
regex15flags=26
regex16flags=64

;
;
;Options:
;
; modify the field names: (!) is prefixed if the expression is erroneous or not found,
; also the optional expression name is added to the field name, otherwise PCRE-RegEx<number>
; if enabled you may lose compatibility with saved custom columns if the field names are altered
ModFieldName=true

; read expression for regex1 from regex1.txt (for complicated expressions including linebreaks)
Regex1Extern=false

; memory (in MiB) for file reading - the higher the more unlikely that singleline searches won't match
; 5-500 MiB
FileMemory=10

; up to what file size (in MiB) will files be automatically searched for custom columns view
; -> larger files will be searched on demand with <spacebar>, 1 MiB minimum
OnDemandLimit=50

; files above GlobalFileSizeLimit (in MiB) will never be searched in all modes (for better
; performance and response times when searching) and always return no match to TC, 5 MiB minimum
GlobalFileSizeLimit=2048

; if an expression (regex1 - up to regex99) won't compile or is empty, use a match-all expression (.*)
; instead, otherwise always return no match to TC without reading the file at all for the faulty content field
MatchAllForErrors=false

; create or append "PCREsearch_error.log" with PCRE errors
LogErrors=true

; read buffer for file encoding analyze  -  1: 32kiB, 2: 64kiB, 3: 128kiB ... 8: 4MiB
AnalyzeBuffer=4

; result caching - 0: disable caching, 1: 8192 fields, 2: 16384 fields, 3: 32768 fields ... 8: 1048576 fields
; (7 and 8 just for the x64 plug-in, bounded to 256k for 32bit)
ResultCache=2

; clear result cache completely when pressing F2 or Ctrl+R (or menu command) to force a refresh of any current view
ClearCacheOnReread=false

; allow binary zeros (two joined zero bytes) for detecting UTF-16 files (does not affect UTF-8 detection)
UnicodeZerosValid=false

; length (in characters) of the returned string for random fields, 1-1022
RandomLength=32

; maximum loops for trying to match a character for random fields, before returning an empty field to TC
; 1: 262144, 2: 524288, 3: 1048576 ... 9: 67 million
RandomLoops=3

; if a text-filtered search fails try a normal (raw) search instead, otherwise return error and empty field to TC
NormalSearchForFilterFail=true

; try to sort fields alphabetical before reporting them to TC
SortFields=false

; treat all expressions case-sensitive, otherwise case-insensitive, by default
CaseSensitiveDefault=false

; path for the 32-bit Oracle OiT filters, needed for the 32-bit plug-in
OitDllPath32=oit32

; path for the x64 Oracle OiT filters, needed for the x64 plug-in
OitDllPath64=oit64

; excluding filter for the Oracle OiT fulltext search (TC 9.0+)
OitFulltextExclude=1800-1804;1806-1807;1812-1816;1820-1822;1826-1827;1999

; Oracle OiT fulltext search: treat files with an unknown format (ID 1999) as text
OitFulltextTreatUnknownFileFormatAsText=false


; Filters for the normal search operation - works by putting file_extension=filter_name
; use 'xdoc' for using xdoc2txt.exe, 'oit' for using Oracle OiT, so you may only put in:
; extension=xdoc or extension=oit
; for using either filter for these file types when regexXflags is set accordingly;
; append '|i' for xdoc2txt to use IFilter for that file extension
; (avoid appending other stuff and don't append it for 'oit',
; xdoc2txt will still work normal if it doesn't find a specific IFilter)
[PCREsearch_filters]
doc=xdoc|i
odt=xdoc
pdf=xdoc
rtf=xdoc|i
sxw=xdoc
xml=xdoc|i


[PCREsearchConfig]
LastField=12
CoordX=585
CoordY=141
I had to change the PCREsearch.ini as follows

Code: Select all

FROM: regex1name=Encoding Check
TO:   regex1name=My search term
because otherwise I would not have had "My search term" available on the Plugins Tab in the properties field.
But there were 8 other choices, only "My search term" was not available.

Therefore "My search term" is currently available twice:

Code: Select all

regex1name=My search term
regex13name=My search term
Btw, the 3rd link from the first post [WDX] PCREsearch for "Oracle Outside In Technology Content Access filters":
http://www.oracle.com/us/technologies/embedded/025613.htm no longer works. Here is the current link.

I would be happy if you could show me the solution to the RegEx query (as shown in the picture) with your plugin.
Thank you in advance for your endeavours.

Regards,
Karl
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: [WDX] PCREsearch

Post by *gdpr deleted 6 »

2tuska
you have two regex entries there in your INI file with the same field name "My search term" (parameters for regex1 and regex13).
I believe you are not using the "My search term" field you think you are using. It seems you want to use regex13 but instead you selected regex1 for the search.

Note that regex1 in your INI file has the type "encoding check". Also notice the options provided in TC's plug-in search there in your screenshot. The value options in your search rule are text encodings. Now, this makes me believe you have actually used regex1 for your search, and the search results probably reflect what the plug-in thinks the encodings for the respective files are.

I don't know how you ended up with two different PCRE search terms having the same field name, but I suggest to avoid having multiple entries with the same field name. (Ideally, the plug-in GUI should be implemented in a way that prevents this from happening. But i am not a user of this plug-in, and i have no experience how its GUI operates.)
Last edited by gdpr deleted 6 on 2020-10-17, 15:03 UTC, edited 2 times in total.
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

elgonzo wrote: 2020-10-17, 14:25 UTC 2tuska
you have two regex entries there in your INI file with the same field name "My search term" (parameters for regex1 and regex13).
I believe you are not using the "My search term" field you think you are using.
It seems you want to use regex13 but instead you selected regex1 for the search.
Hi,
Yes, I just realized that too.

(I have here for test purposes: -->My search term ... entered that: \Q\.tlb\E
then I had the value: "Yes" or "No" on the plugins tab. But as a search result I got [No files found])

My problem is that after first calling of PCREsearchConfig64.exe I do NOT get: 'My search term' in the properties field on Tab Plugins.
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: [WDX] PCREsearch

Post by *gdpr deleted 6 »

Your regex

Code: Select all

\Q\.tlb\E
doesn't seem right. You are using verbatim quoting \Q ... \E here (see http://www.pcre.org/current/doc/html/pcre2compat.html , section 6), which makes the pattern literally match a string (file path) containing a backslash followed by a dot followed by "tlb". Thus your regex pattern would match something like

Code: Select all

C:\Some\Path\.tlb
(it contains backslash-dot-t-l-b)


but it would NOT match something like

Code: Select all

C:\Some\Path\file.tlb
(it does not contain backslash-dot-t-l-b)


I guess you just want a "normal" regex pattern \.tlb without verbatim substrings (without \Q ... \E), as that would match the file extension ".tlb".
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

elgonzo wrote: 2020-10-17, 15:04 UTC I guess you just want a "normal" regex pattern \.tlb without verbatim substrings (without \Q ... \E), as that would match the file extension ".tlb".
You're right.

But even if I duplicate PCREsearch.Sample.ini and rename it to PCREsearch.ini and start this query:

Code: Select all

\.tlb
Then I get "Encoding Check" first and another 7 options on the Plugins tab in the properties field, but not: 'My search term' !

Suggest to wait for milo1012.
Thanks for your support!
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: [WDX] PCREsearch

Post by *gdpr deleted 6 »

Just a side note (it just crossed my mind) with regard to the regex pattern (not the plug-in itself; as you said, milo is much better suited to advise in this regard):

\.tlb will match the text ".tlb" wherever it appears in the string / file path. If you want to restrict it to matching the file extension ".tlb" at the end of the file path only, it's probably better to use the pattern \.tlb$ (unless the plug-in automatically appends "$"; the "$" is a special character in regex that anchors any possible match for the pattern at the end of the string or text line).
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

elgonzo wrote: 2020-10-17, 15:35 UTC Just a side note (it just crossed my mind) with regard to the regex pattern (not the plug-in itself; as you said, milo is much better suited to advise in this regard):

\.tlb will match the text ".tlb" wherever it appears in the string / file path. If you want to restrict it to matching the file extension ".tlb" at the end of the file path only, it's probably better to use the pattern \.tlb$ (unless the plug-in automatically appends "$"; the "$" is a special character in regex that anchors any possible match for the pattern at the end of the string or text line).
Thanks, I will make a note of it.
In the meantime I have uninstalled and reinstalled the plugin,

because I had previously changed to the following directory by installing the plugin:
%COMMANDER_PATH\Plugins\wdx\PCREsearch_[WDX]-PCREsearch\

Plugin directory according to file "pluginst.inf" ... defaultdir=PCREsearch
%COMMANDER_PATH%\Plugins\wdx\PCREsearch\

Now I have changed to defaultdir=PCREsearch.

The problem persists:
"Then I get "Encoding Check" first and another 7 options on the Plugins tab in the properties field, but not: 'My search term' !"

Windows 10 Pro (x64) Version 2004 (OS build 19041.572) | TC 9.51 x64/x86
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 »

tuska wrote: 2020-10-17, 16:11 UTCThe problem persists:
"Then I get "Encoding Check" first and another 7 options on the Plugins tab in the properties field, but not: 'My search term' !"
Indeed, sorry, I forgot that while I'm shipping the plug-in's installation archive with this expression, the default/sample ini doesn't have it activited (yet).
So just open the config tool and change the Fields dropdown list (upper left) to 13 or higher (default is 10). You can identify the "activated" fields by the prefixed "-->" in the field list.
The corresponding ini option is regexcount.

(BTW: the idea of "activating" is due to the fact that TC's plug-in field dropdown list can by quite confusing/hard to navigate with a lot of fields. Enabling all 999 possible fields at once could be quite confusing)
TC plugins: PCREsearch and RegXtract
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

2milo1012
Thank you for this prompt information and explanations!

This step was listed in your description anyway - but for me everything here is "new territory" and I have
unfortunately overlooked that.

This solves the problem, i.e. the one in the properties field is now also listed as "My search term"
and also automatically selected after calling up the button with the saved search:
pcresearch | My search term | = | Yes [No]

Now only this question remains open for me:
Which RegEx query would I have to do with your plugin that is located in the directory C:\Windows\System32\
files with extension "tlb" are found?

(With TC, 'Everything' and RegEx101 I receive with the RegEx query: \.tlb 15 files are displayed/15 matches).
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 »

tuska wrote: 2020-10-17, 19:21 UTCNow only this question remains open for me:
Which RegEx query would I have to do with your plugin that is located in the directory C:\Windows\System32\
files with extension "tlb" are found?

(With TC, 'Everything' and RegEx101 I receive with the RegEx query: \.tlb 15 files are displayed/15 matches).
I see: you're trying to search for pure filename instead of file content.
Be aware that the plug-in was primarily intended for file content, but I added a file name option in version 2.5 anyway.
So like elgonzo said: either search for

Code: Select all

\.tlb
or

Code: Select all

\Q.tlb\E
Now the main option you need to set: in Field flags / options check the option:

Code: Select all

Search in filename only
And yes, if you're looking for the file extension match, you need to anchor your expression at the filename's end, because otherwise this would match filenames like

Code: Select all

partA.tlbpartB.someextension
(but this is the case for all RegEx methods, even TC's built-in)
So you could use for example

Code: Select all

\.tlb$
TC plugins: PCREsearch and RegXtract
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

2milo1012
Thank you very much!
With this setting and the parameter: \.tlb$ I could now also find the 15 files as a search result using your plugin!

Thanks also to elgonzo, who also dealt with this topic.

I'm sorry to keep you so busy.

Regards,
Karl
User avatar
tuska
Power Member
Power Member
Posts: 3741
Joined: 2007-05-21, 12:17 UTC

Re: [WDX] PCREsearch

Post by *tuska »

I am not clear on this point.
milo1012 wrote: 2013-09-13, 14:35 UTC Note when using the uLister plug-in:
You will need to use the same directory for the filter DLLs, or one plug-in won't work after the other was loaded.
Currently the filter DLLs are saved as follows:

Code: Select all

%COMMANDER_PATH%\Plugins\wdx\PCREsearch\oit32\
%COMMANDER_PATH%\Plugins\wdx\PCREsearch\oit64\

%COMMANDER_PATH%\Plugins\wlx\ulister\redist32\
%COMMANDER_PATH%\Plugins\wlx\ulister\redist64\
Do I need to act now?
If so, what to do?

Thanks!
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: [WDX] PCREsearch

Post by *milo1012 »

2tuska
I'm not entirely sure how things looks with the current OiT DLLs, but with the 8.5.3 DLLs I tested (and still using) the plug-in three years ago, the problems were as I described: the first plug-in being loaded works, after loading the 2nd the first one stops working.

I think it's due to the way the OiT framework works: many DLLs are identical, only a few are different (just do a "sync dir" comparison in TC with the current version vw and ca archive and you'll see). So when loading the 2nd, the reference path for the OiT function core (probably wvcore.dll) gets overridden and things get messed up: ca won't find it's specific DLLs, as the core DLL now points to the vw path (and vice versa).

So yes: I suggest you merge both OiT archives in one dir (each for the same bitness of course), i.e. copy the files unique for ca to vw or vice versa and point both plug-ins to that merged path. Make a sync dir comparison in TC before that, if you're unsure.

I'll test the current 8.5.5 OiT DLLs soon, but I don't think that this behavior has changed.
Readme.html wrote:Note when using the uLister plug-in:
You will need to use the same directory for the filter DLLs, or one plug-in won't work after the other was loaded (and you'd need to restart TC, because part of the plug-in remains in memory, even when all plug-ins are not active). The viewer and the content access filter package actually share most of the filter files, only a handful are either missing or added in each package. This means you can safely put both of them in the same dir, provided that you use the same version of both packages. This means that if simultaneously using uLister, just set PCREsearch's filter directory accordingly, to point to the fitting uLister filter dir (see the explanation above), as the other way is probably not feasible (due to uLister not being able to set a custom path).
TC plugins: PCREsearch and RegXtract
Post Reply