[WDX] PCREsearch

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

[WDX] PCREsearch

Post by *milo1012 »

PCREsearch 2.5

RegEx content plug-in with Unicode support, 32+64 bit
- based on Perl Compatible Regular Expressions (PCRE) library 8.39
This plug-in may replace TC's RegEx engine for file content.
As of version 2.5 you can also use it for searching in filenames.

So what's special about it?
It can use the full PCRE features, primarily intended for using the singleline mode "(?s)",
which Total Commander can't provide (a possible use).
As of version 2.1 you can also use it to compare files in TC's Synchronize dirs function.

There is no need to provide Regular Expressions, just use a general search string for the file's content
and escape the necessary syntax characters or quote the whole term (\Q...\E),
and you can e.g. count the string occurrences and therefore filter and narrow your search with
a custom number of strings that your files must contain in order to match the search in Total Commander.
Additionally you can create Random Strings and return the detected file encoding (check for Unicode files).
It also features result caching in memory, to get previously searched files/fields in an instant, with a definable entry limit.

There is support for a text filter, which will filter certain file formats, e.g. for .PDF and .DOC files,
to expand the otherwise raw file search to most document/office files.
As of version 2.5 there is support for the Oracle Outside In Technology Content Access filters.
When installed and working, those filters will provide an additional powerful Unicode capable fulltext search for TC 9.0 and above
(on top of the text filter capability for the normal plug-in operation).

All configuration is done through an INI file "PCREsearch.ini", where you can create fields you may need for search repeatedly
or just for general search (un- and re-load the plug-in).
This file is located in the same directory as the plug-in file.
When loaded, the expressions are used as long as the plug-in remains in memory.
If you don't want to restart Total Commander every time to change the Expressions,
just make your changes, save the file and use the internal TC command cm_UnloadPlugins
to unload all plug-ins and start another search.

There is a configuration utility provided ("PCREsearchConfig"), which greatly helps you
configuring your fields, and additionally has the ability to test expressions by using a test string,
plus it provides instant feedback in case of erroneous expressions.

Features
- Using the full feature set for Perl/PCRE expressions when searching file content in TC, e.g. the Dotall/Singleline mode, Look-around assertions, Character properties incl. complete Unicode scripts and properties
- Up to 99 fields configurable in the plug-in's INI file
- Custom field names and types (boolean (yes/no), counting, string return, random strings)
- Counting individual string occurrences
- Compare files in TC's Synchronize dirs function, also for files with different encoding
- Searching in most Unicode files, not just plain ANSI (automatic encoding detection)
- Unicode file names and Unicode Regular Expressions
- Result Caching in memory - for retrieving fields in an instant when they were already obtained in the past
- Configurable memory limits for avoiding slow file reads and non-responsive TC
- Custom replacement schemes when returning strings (referencing subgroups)
- Create random strings by providing a RegEx
- Text filter support (xdoc2txt and Oracle OiT), which enables search in the otherwise unreachable text parts in most most office/pdf/text documents
- Unicode capable fulltext search for TC 9.0 and above when using the OiT filters
- Output line numbers and file offsets for search results
- Count the average string/result length in a file
- sort fields alphabetically before reporting them to TC
- Ships with a config utility which features: a RegEx test by typing a test string, on-the-fly RegEx error check,
built-in RegEx and replace string syntax summary, font selection


Usage examples:
- Count line numbers of any file
- Count the occurrence of any string, character or byte in a file
- Count individual Strings/Matches, e.g. for skipping identical lines or words
- display the line number or the file offset on which your search term is found
- display the average string/result length of your search term
- Return the 1st, 2nd or nth line of a file to TC for display or search
- Filter and display file Headers for Magic Numbers / Signature, to check for files with erroneous extensions or embedded files
- Comparing text files with different line endings and/or varying whitespace (including empty lines), or source files with different indent styles, etc.
- Check files for Unicode encoding
- Return random strings with a custom character range, for e.g. randomizing file names in MRT, or quick random filling fields from different plug-ins in TC
- Search in filenames only and return a custom built result string to quickly preview purified filenames in TC's custom columns


Check the included HTML files for PCRE Syntax.
Default is case-insensitive search, to match TC's behavior.
Prefix your Expression with (?-i) to search case-sensitive.

The text filter tool requires Visual C++ 2008 runtimes, but you can patch the file to be portable (instructions provided).
The optional OiT filter needs a separate download from
http://www.oracle.com/technetwork/middleware/content-management/downloads/oit-dl-otn-097435.html
Note when using the uLister plug-in:
You will need to use the same directory for the filter DLLs, or one plug-in won't work after the other was loaded.



Current Version 2.5:
(32+64 bit+source)
on totalcmd.net
SHA1: 1d65fbe9a56179bc42b36639c856a878323e92b1






Old Version 2.1:
(32+64 bit+source)
on totalcmd.net
SHA1: 3d0675fbb8090f56c92937ae11998f1a1af8c491

Old Version 2.0:
(32+64 bit+source)
on totalcmd.net
SHA1: 7f58138fb58d3d0a76ea3fafdc091a07abc8e6fb

Old Version 1.6 beta:
(32+64 bit+source)
on totalcmd.net
SHA1: baf518c5128723b03490c5c88f1fdc2c8e2623f8

Old Version 1.5:
(32+64 bit+source)
on totalcmd.net
SHA1: 2f41bfc0353f08cbb2ac78f51bafba179f086762

Old Version 1.3:
(32+64 bit+source)
PCREsearch_1_3.rar (on file-upload.net)
and on rghost.net
SHA1: 8f3306a45705cef913fc33a5b4b6fed413648efa
Now using PCREsearch.Sample.ini, copying it to PCREsearch.ini before using the plug-in is recommended

Old Version 1.2:
(32+64 bit+source)
PCREsearch_1_2.rar
SHA1: afb1c4eeede85fb97e204cda7e85e20082c6736c

Old Version 1.1:
(32+64 bit+source)
PCREsearch_1_1.rar
SHA1: 378f64920716aec913ecad1fc165622d545b9e70

Old Version 1.0:
(32+64 bit)
PCREsearch_1_0.rar
SHA1: 2e70298d5e8edd27e2b7081bd726772fe9e917c6

Old Version 0.8:
PCREsearch.rar
SHA1: f05b27ac616009aed37c4b1d61d282d0a7ef140b


Please report bugs and give me some feedback.
Last edited by milo1012 on 2016-07-04, 03:31 UTC, edited 20 times in total.
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

I think such plugin should allow to define multiple regex strings and provide a field per regex string, it is quite easy to do.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Yes, should be doable.
Although I'd need some Unicode capable config file for this
which I can parse for multiple entries, since I can't rely on the old .ini format.
Maybe next version.
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

You can use INI files in UTF-16 encoding, standard Windows API for INI files supports it.
User avatar
Samuel
Power Member
Power Member
Posts: 1929
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

You could create a tcmatch.dll instead of a wdx plugin.
Italiano
Junior Member
Junior Member
Posts: 11
Joined: 2013-09-04, 16:49 UTC

Post by *Italiano »

Please help me perform a search from this screen shot:
Image: http://img33.imageshack.us/img33/7254/a3o5.jpg
What are the steps?
Thanks.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Italiano wrote:Please help me perform a search from this screen shot:
Image: http://img33.imageshack.us/img33/7254/a3o5.jpg
What are the steps?
Install the plug-in to your preferred directory. Edit the PCREsearch.cfg which is now in that directory with your fitting Regular Expression, e.g.:

Code: Select all

(?si)and.*world.
Or, in case your file has "and" and "world" in other positions not related to each other, limit the amount of characters to e.g. 50:

Code: Select all

(?si)and.{40,50}world
(in case you want to search case-sensitive use just (?s) )
Save the file. Restart Total Commander or use the internal command cm_UnloadPlugins.

Open TC's search dialog.
Set or check the "Search in:" to where your files are, just like you usual would.
(if you have some large files (> 50 MiB) you could optionally exclude them in the advanced options since they slow down the search)

Go to the plugins tab, activate PCREsearch plugin to "yes", start a search.
(make sure that "Search in plugins" is checked)

Note: There's still a chance that for some files the search won't match if the words
and and world are split among lines.
In that case you could make things even tighter:

Code: Select all

(?si)a.{45,48}d.
But that's up to you. Experiment with different expressions.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Samuel wrote:You could create a tcmatch.dll instead of a wdx plugin.
Yes, I know your QuickSearch eXtended but I had no use for it so far.
Is the tcmatch.dll only used for quick search?
It can't find any documentation about that DLL, i.e. the interface specs for it.
For every other plugin (wdx, wlx, wcx, wfx) documentations are quite good.
User avatar
Samuel
Power Member
Power Member
Posts: 1929
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

Unfortunately there is no good dokumentation. You may use my plugin as guide as the source is available.

It is only used for Quick Search / Quick Filter. In combination with Branch view you have some kind of search.
Italiano
Junior Member
Junior Member
Posts: 11
Joined: 2013-09-04, 16:49 UTC

Post by *Italiano »

milo1012, thank you very much for taking the time to create and explain this plugin. I consider myself an average "stupid user" and I don't get it. Can you please make an example if I want to search exactly "the only one of its kind" like the screen shot indicates. I think I could comprehend the plugin better then. Many thanks.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Where exactly are the problems?
Understanding the Regular Expressions or understanding "how" to search with the plug-in?
Or already how to install the plug-in?

Like I said, go to the file PCREsearch.cfg, open it with an editor (press F4 in Total Commander when the file is selected).
Delete everything in it (should contain only ".*" when you just installed it)
and now copy and paste this expression which should fit exactly your sentence like you said:

Code: Select all

(?si)and.{40,50}world
(I made sample text files and the expression definitely works!)
Type nothing else in it, just save the file now and close it.

Restart Total Commander (close it, open it again).
Now open TC's search dialog (dialog box Find files, like in your picture).
Set Search in: to the directory/folder in where you want to search, just like you would do normally.
(not necessary if you already navigated to your directory and then opened Find files, the path should already be there)
Search for: should be empty. (you may optionally set a filter to look just for text files or similar)
Don't check Find text, like you did in your picture, we don't need it for the plugin.

Now, go to the tab Plugins, select pcresearch. (it should now read: PCREsearch = yes)
Make sure Search in plugins is checked. (should be automatically set when you selected the plug-in)

Start your search.
It should now match your files containing your sentence!
Italiano
Junior Member
Junior Member
Posts: 11
Joined: 2013-09-04, 16:49 UTC

Post by *Italiano »

I guess I don't understand Regular Expressions. Example from the screenshot works fine indeed but what puzzles me is why must I put words and and world in the search file. Do I have to know which words precede and follow the string? If I'm looking for the string "the only one of its kind", I never know which words precede and follow the string. This incomprehension prevents me from using this plugin in other examples.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

No, I just used the same example from your screenshot in my last post.
My mistake.
You wouldn't find "the only one of its kind" with that.
I think you need to read some regular expression tutorials, the Total Commander help file also has some examples.
Your problem was that you don't know where line breaks occur for your sentence.
So the solution is that you just use the smallest possible start and end of your string and make the "middle" of it a variable.
You don't need to know which words precede and follow the string, you just use a description for it so that you can find it no matter where the line breaks are.
For "the only one of its kind" you could use sth. like:

Code: Select all

(?si)the.{17,19}kind
A Windows line break consists of two characters (CR+LF).
I count 17 characters between the and kind, including spaces.
So I just add two for (one single) line break. You might add four when you expect two line breaks,
or some more or less when there are fewer or additional spaces and so on.
Italiano
Junior Member
Junior Member
Posts: 11
Joined: 2013-09-04, 16:49 UTC

Post by *Italiano »

Ok, it works now, thanks.
Is it possible to make it more user-friendly? Like not having to edit a file and restart TC for each search? That would be a minimum that constitutes user-friendly.
Thanks for the great work.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Italiano wrote:Is it possible to make it more user-friendly?
Not really, because TC doesn't allow me to read what has typed in the plugin field.

But well, like already mentioned, you can use cm_UnloadPlugins to reload the expression,
just type or paste it in TC's command line or create a button in the Buttonbar with that command.

While you're in the Buttonbar you can also create a button with a shortcut to the PCREsearch.cfg
so that you don't have to open it manually every time.

Besides that there's not much I an do to make it more comfortable,
at least until TC implements multi-/singleline search on it's own, but I'm open to suggestions.
I could recommend other programs that search files with RegEx,
but the point is to keep the file list in Total Commander so that you can continue to work with these files (keep the list open).
Post Reply