[WCX] RegXtract - String Extractor with RegEx
Moderators: Hacker, petermad, Stefan2, white
I think this would work:Peter wrote:- use the (text-)files that are selected in TC
- test if in the 5. line there is the string "hello"
- if string is found, write lines 1 - 6 to "result.txt" (append the results)
Code: Select all
^(.*\R?)(?1){3}(.*hello.*\R?).*
Replace:
$0
If you want to match "hello" case sensitive, either use
Code: Select all
(?-i)^(.*\R?)(?1){3}(.*hello.*\R?).*
If your input files have mixed line endings (Unix LF, Windows CRLF, Mac CR)
you probably better use:
Code: Select all
^(.*)\R?(.*)\R?(.*)\R?(.*)\R?(.*hello.*)\R?(.*)
Replace:
$1
$2
$3
$4
$5
$6
TC plugins: PCREsearch and RegXtract
Re: [WCX] RegXtract - String Extractor with RegEx
Sorry - me again with a question for beginners.
I have tons of text-files like this
Sorry - I have to modify it. I used a viewer which did not show the real context. Here now an example what I want to extract in red. Attention: For example behind \n there are many blanks - but the forum reduces them to one when displaying ..
lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text
(Or - as challenge also remove the tags:
lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text
I'm sure it should be simple - but I don't see the solution. Who can help?
Thanks
I have tons of text-files like this
...lots of different text...
Signature\n -> first fixed text plus a blank at the end
\n -> second fixed text plus a blank at the end
(vlr-reactions reactor) -> variable Text which should be extracted
\n -> third fixed text plus a blank at the end
... continuing tons of text ...
Sorry - I have to modify it. I used a viewer which did not show the real context. Here now an example what I want to extract in red. Attention: For example behind \n there are many blanks - but the forum reduces them to one when displaying ..
lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text
(Or - as challenge also remove the tags:
lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text
I'm sure it should be simple - but I don't see the solution. Who can help?
Thanks
TC 10.xx / #266191
Win 10 x64
Win 10 x64
Re: [WCX] RegXtract - String Extractor with RegEx
2Peter
Well, it's not THAT easy.
A possible expression would be:
The first group would hold the "strcat" part, the 2nd the actual string array.
So a replacement string could be sth. like:
You should carefully test this expression, as it might not work for all cases, since I'm not completely sure about the content of your input files (it seems to contain some wild mixture of HTML code and non-HTML content).
Well, it's not THAT easy.
A possible expression would be:
Code: Select all
<div[^>]*><pre[^>]*>\((.*)\s<em\s[^>]*>(.*)</em>\)</pre>
So a replacement string could be sth. like:
Code: Select all
$1: $2
TC plugins: PCREsearch and RegXtract
Re: [WCX] RegXtract - String Extractor with RegEx
Milo, thank you - works fine for my needs.
Some strings are not extracted correctly, for example I want the red marked parts, but get the entire string:
Raw-Data:
...............</a>Signature\n </h2> \n <div class=\"codeBlock\"><pre>(ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n <dt><a name=\"WS1A9193826455F5FF-7C08E89711EC57B47A8-7818\"></......................
Result:
ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n.........
But for me it is OK, I can edit it with the editor.
Some strings are not extracted correctly, for example I want the red marked parts, but get the entire string:
Raw-Data:
...............</a>Signature\n </h2> \n <div class=\"codeBlock\"><pre>(ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n <dt><a name=\"WS1A9193826455F5FF-7C08E89711EC57B47A8-7818\"></......................
Result:
ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n.........
But for me it is OK, I can edit it with the editor.
TC 10.xx / #266191
Win 10 x64
Win 10 x64
Re: [WCX] RegXtract - String Extractor with RegEx
2Peter
Well, it works on a standalone subject, but for a file with multiple subjects concatenated it's probably sth. different.
You could try making the quantifiers non-greedy (=lazy):
Well, it works on a standalone subject, but for a file with multiple subjects concatenated it's probably sth. different.
You could try making the quantifiers non-greedy (=lazy):
Code: Select all
<div[^>]*><pre[^>]*>\((.*?)\s<em\s[^>]*>(.*?)</em>\)</pre>
TC plugins: PCREsearch and RegXtract
Re: [WCX] RegXtract - String Extractor with RegEx
Thanks Milo
there is no difference to former code, but it's a great base for me to do further tests.
Peter
there is no difference to former code, but it's a great base for me to do further tests.
Peter
TC 10.xx / #266191
Win 10 x64
Win 10 x64
Re: [WCX] RegXtract - String Extractor with RegEx
Thanks so much for this plugin,
As i understand it's not possible to replace an save the same document, only to get a copy of it,
The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?
As i understand it's not possible to replace an save the same document, only to get a copy of it,
The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?
Re: [WCX] RegXtract - String Extractor with RegEx
The plug-in can create new files in the same location as the original ones, but with a new extension. Per default, the plug-in will name them <<original filename with extension>>.txtLeoLUG wrote: 2020-12-04, 03:28 UTC The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?
So if you're sure if the s&r was successful, you can delete the original files and rename the newly output files with e.g. TC's MRT tool (remove the .txt extension).
So basically you need to do:
- use ctrl+B in your target dir
- mark all files that you want to s&r
- now: either hold the ctrl key while clicking on the Pack files button in TC's (default) button bar, or manually change the target file mask in the Pack files dialog
- make sure to check the "Create separate archives, one per selected file/dir" option in the Pack files dialog
- the input box in the Pack files dialog should now look like this:
Code: Select all
RegXtract:*.*.RegXtract
- now open the RegXtract config ("Configure..." button)
- make sure that "Search and Replace" is checked
- enter your desired RegEx and replace string
- you may change the option in the "Outfile extension" dropdown menu, to tell the plug-in which extension it should use (.txt is default)
- close dialog, start the pack operation
- done, the output files should now reside in the original file's dir(s), having the same name but added extension
TC plugins: PCREsearch and RegXtract
Re: [WCX] RegXtract - String Extractor with RegEx
Thanks for the so detailed post!
[WCX] RegXtract - Possibility to have current file name as replacement / variable
Hi,
I just found RegXtract. As I like regexes (Perl programmer) I was looking for a way to create a type of "summary" file for multiple input files.
e.g. search for text using a specific regex and write out a delimiting line (---------------------------------...) plus the *filename* before the matched content.
But I'm missing that <current file name> as replacement...
Am I just too silly or blind ?
Any help appreciated,
Axel
I just found RegXtract. As I like regexes (Perl programmer) I was looking for a way to create a type of "summary" file for multiple input files.
e.g. search for text using a specific regex and write out a delimiting line (---------------------------------...) plus the *filename* before the matched content.
But I'm missing that <current file name> as replacement...
Am I just too silly or blind ?

Any help appreciated,
Axel