Search word, but "within x words of" range only?
Moderators: Hacker, petermad, Stefan2, white
Search word, but "within x words of" range only?
Searching "within x words of"
Is there a way to specify in a search, to say "I'm looking for the word ORANGE to be within 100 words of APPLE" (Just an example), in a text file.
I remember with Novell, there was such an option when searching, which was GREAT!.
If not, do you know of any products that WOULD allow this?
(I need to search our huge mdaemon logs for a specific email address and specific subject, but there are thousands and thousands to search through, driving me nuts).
Is there a way to specify in a search, to say "I'm looking for the word ORANGE to be within 100 words of APPLE" (Just an example), in a text file.
I remember with Novell, there was such an option when searching, which was GREAT!.
If not, do you know of any products that WOULD allow this?
(I need to search our huge mdaemon logs for a specific email address and specific subject, but there are thousands and thousands to search through, driving me nuts).
You could use TextCrawler with this regular expression:
It searches for "word1" near "word2" within a range of 0 to 5 words.
The regex library in TC is rather limited for cases like this. It has no non-capturing groups and a scope of only one line.
Code: Select all
\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b
The regex library in TC is rather limited for cases like this. It has no non-capturing groups and a scope of only one line.
Thats the main reason why I wrote PCREsearch.ZoSTeR wrote:The regex library in TC is rather limited for cases like this. It has no non-capturing groups and a scope of only one line.
That Expression also works here, no need to use things like TextCrawler.
Just modify the INI file, e.g. :
Code: Select all
regex1=\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b
TC plugins: PCREsearch and RegXtract
Yes I nearly forgot about PCREsearch and it's great if you want to find and handle the files that contain a specific pattern.
TextCrawler has the advantage of displaying all the matching text plus its context on the fly. Since the OP has to look at log files it's my guess that this important.
Dunno if there's any way to combine "feed to listbox" with quick-view/lister to display the matching text. I guess one could build a nice summary with RegXtract, it depends on what the final result or workflow is supposed to be.
TextCrawler has the advantage of displaying all the matching text plus its context on the fly. Since the OP has to look at log files it's my guess that this important.
Dunno if there's any way to combine "feed to listbox" with quick-view/lister to display the matching text. I guess one could build a nice summary with RegXtract, it depends on what the final result or workflow is supposed to be.
I'm not great with RegEx... will this find even if they are on separate lines (eg... within 20 words, but that may be 3 lines down)... THANKS
PS what's a good way to get up to speed on RegEx?
PS what's a good way to get up to speed on RegEx?
milo1012 wrote:Thats the main reason why I wrote PCREsearch.ZoSTeR wrote:The regex library in TC is rather limited for cases like this. It has no non-capturing groups and a scope of only one line.
That Expression also works here, no need to use things like TextCrawler.
Just modify the INI file, e.g. :It might be a bit dull to modify the INI every time, but for frequently (re-)used expressions it's fine, and you can have multiple fields.Code: Select all
regex1=\b(?:word1\W+(?:\w+\W+){0,5}?word2|word2\W+(?:\w+\W+){0,5}?word1)\b
Yes, and to honor your example:johnstonf wrote:will this find even if they are on separate lines (eg... within 20 words, but that may be 3 lines down).
Code: Select all
\b(?:APPLE\W+(?:\w+\W+){0,5}?ORANGE|ORANGE\W+(?:\w+\W+){0,5}?APPLE)\b
Code: Select all
BANANA BANANA BANANA BANANA ORANGE BANANA
BANANA
BANANA BANANA
BANANA APPLE BANANA BANANA
Code: Select all
BANANA BANANA BANANA BANANA ORANGE BANANA
BANANA
BANANA BANANA
BANANA
BANANA APPLE BANANA BANANA
So all you need to do: replace the quantifier in the curly brackets (both) with the distance you want,
and ORANGE/APPLE with the actual words you're looking for.
Take care if these words/strings contain some RegEx syntax characters, you'd need to escape them if they do.
Now in the PCREsearch.Sample.ini (or create a new PCREsearch.ini file) use for example these entries:
Code: Select all
[PCREsearch]
regex1=\b(?:APPLE\W+(?:\w+\W+){0,5}?ORANGE|ORANGE\W+(?:\w+\W+){0,5}?APPLE)\b
regex1name=ORANGE and APPLE near each other (5)
regex1type=0
To output the resulting string (for custom columns) use this
Code: Select all
regex1type=3
The TC help has a section for RegEx, which describe the basics quite good, including how to escape characters,johnstonf wrote:PS what's a good way to get up to speed on RegEx?
but for advanced expressions (like the one above) you probably want to take your time and read some literature
(ZoSTeR 2nd link directs to such a book)
or try some sites or programs, like regular-expressions.info or regexbuddy (but I wouldn't advocate it for beginners).
I could also recommend my RegXtract plugin to test the expression, it also has a syntax summary.
TC plugins: PCREsearch and RegXtract
Thanks so much...
milo1012 wrote:Yes, and to honor your example:johnstonf wrote:will this find even if they are on separate lines (eg... within 20 words, but that may be 3 lines down).will findCode: Select all
\b(?:APPLE\W+(?:\w+\W+){0,5}?ORANGE|ORANGE\W+(?:\w+\W+){0,5}?APPLE)\b
because there are only five "words" between them (which is allowed), but not this:Code: Select all
BANANA BANANA BANANA BANANA ORANGE BANANA BANANA BANANA BANANA BANANA APPLE BANANA BANANA
(six is too much - won't match)Code: Select all
BANANA BANANA BANANA BANANA ORANGE BANANA BANANA BANANA BANANA BANANA BANANA APPLE BANANA BANANA
So all you need to do: replace the quantifier in the curly brackets (both) with the distance you want,
and ORANGE/APPLE with the actual words you're looking for.
Take care if these words/strings contain some RegEx syntax characters, you'd need to escape them if they do.
Now in the PCREsearch.Sample.ini (or create a new PCREsearch.ini file) use for example these entries:
This will only work for files containing pure text, you can't search in office files, PDFs and similar (yet).Code: Select all
[PCREsearch] regex1=\b(?:APPLE\W+(?:\w+\W+){0,5}?ORANGE|ORANGE\W+(?:\w+\W+){0,5}?APPLE)\b regex1name=ORANGE and APPLE near each other (5) regex1type=0
To output the resulting string (for custom columns) use this(limited to 1022 characters)Code: Select all
regex1type=3
The TC help has a section for RegEx, which describe the basics quite good, including how to escape characters,johnstonf wrote:PS what's a good way to get up to speed on RegEx?
but for advanced expressions (like the one above) you probably want to take your time and read some literature
(ZoSTeR 2nd link directs to such a book)
or try some sites or programs, like regular-expressions.info or regexbuddy (but I wouldn't advocate it for beginners).
I could also recommend my RegXtract plugin to test the expression, it also has a syntax summary.
I made a YouTube video showing others how to quickly get this installed into TC. See it at http://youtu.be/ohfcQAOy3ZU and hope it helps others to get this nice plugin into their lives quickly and easily.
http://youtu.be/ohfcQAOy3ZU
http://youtu.be/ohfcQAOy3ZU