[WCX] RegXtract - String Extractor with RegEx

trevor12 · Post by *trevor12 » 2015-04-18, 03:16 UTC

thanks for solutions, I forgot say that in my file there are many urls strings but not only urls, it is long text with words and among other there are many urls

I will try your solutions

Peter · Post by *Peter » 2016-04-04, 17:15 UTC

Hi

I need a solution for this job:

- use the (text-)files that are selected in TC
- test if in the 5. line there is the string "hello"
- if string is found, write lines 1 - 6 to "result.txt" (append the results)

Can it be done with this Plugin?

Thanks and regards

Peter

milo1012 · Post by *milo1012 » 2016-04-04, 17:35 UTC

Peter wrote:- use the (text-)files that are selected in TC
- test if in the 5. line there is the string "hello"
- if string is found, write lines 1 - 6 to "result.txt" (append the results)

I think this would work:

Code: Select all

^(.*\R?)(?1){3}(.*hello.*\R?).*

Replace:
$0

Use the default options, and remember to use enough read memory, in case your text files exceed the default 10 MiB.

If you want to match "hello" case sensitive, either use

Code: Select all

(?-i)^(.*\R?)(?1){3}(.*hello.*\R?).*

or check the "Case sensitive" option.

If your input files have mixed line endings (Unix LF, Windows CRLF, Mac CR)
you probably better use:

Code: Select all

^(.*)\R?(.*)\R?(.*)\R?(.*)\R?(.*hello.*)\R?(.*)

Replace:
$1
$2
$3
$4
$5
$6

To prevent mixed style line endings in the output file.

Peter · Post by *Peter » 2016-04-04, 20:35 UTC

Thanks @milo1012
looks great

Peter · Post by *Peter » 2018-08-15, 07:47 UTC

Sorry - me again with a question for beginners.

I have tons of text-files like this

...lots of different text...
Signature\n -> first fixed text plus a blank at the end
\n -> second fixed text plus a blank at the end
(vlr-reactions reactor) -> variable Text which should be extracted
\n -> third fixed text plus a blank at the end
... continuing tons of text ...

Sorry - I have to modify it. I used a viewer which did not show the real context. Here now an example what I want to extract in red. Attention: For example behind \n there are many blanks - but the forum reduces them to one when displaying ..

lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text

(Or - as challenge also remove the tags:

lots of different text ...Signature\n </h2> \n <div class=\"codeBlock\"><pre>(strcat <em class=\"codeEmphasisMild\">[string string_n ...]</em>)</pre> .. lots of text

I'm sure it should be simple - but I don't see the solution. Who can help?

Thanks

milo1012 · Post by *milo1012 » 2018-08-15, 18:08 UTC

2Peter

Well, it's not THAT easy.
A possible expression would be:

Code: Select all

<div[^>]*><pre[^>]*>\((.*)\s<em\s[^>]*>(.*)</em>\)</pre>

The first group would hold the "strcat" part, the 2nd the actual string array.
So a replacement string could be sth. like:

Code: Select all

$1: $2

You should carefully test this expression, as it might not work for all cases, since I'm not completely sure about the content of your input files (it seems to contain some wild mixture of HTML code and non-HTML content).

Peter · Post by *Peter » 2018-08-16, 08:37 UTC

Milo, thank you - works fine for my needs.

Some strings are not extracted correctly, for example I want the red marked parts, but get the entire string:

Raw-Data:
...............</a>Signature\n </h2> \n <div class=\"codeBlock\"><pre>(ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n <dt><a name=\"WS1A9193826455F5FF-7C08E89711EC57B47A8-7818\"></......................

Result:
ssnamex <em class=\"codeEmphasisMild\">ss [index]</em>)</pre></div> <a name=\"WS1A9193826455F5FF-1E1423D1125831BDA67-7B03\"></a><dl>\n.........

But for me it is OK, I can edit it with the editor.

milo1012 · Post by *milo1012 » 2018-08-16, 16:38 UTC

2Peter
Well, it works on a standalone subject, but for a file with multiple subjects concatenated it's probably sth. different.
You could try making the quantifiers non-greedy (=lazy):

Code: Select all

<div[^>]*><pre[^>]*>\((.*?)\s<em\s[^>]*>(.*?)</em>\)</pre>

Peter · Post by *Peter » 2018-08-20, 07:04 UTC

Thanks Milo
there is no difference to former code, but it's a great base for me to do further tests.

Peter

LeoLUG · Post by *LeoLUG » 2020-12-04, 03:28 UTC

Thanks so much for this plugin,
As i understand it's not possible to replace an save the same document, only to get a copy of it,
The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?

milo1012 · Post by *milo1012 » 2020-12-04, 16:50 UTC

LeoLUG wrote: 2020-12-04, 03:28 UTC The problem i have with that: When i have a lot of files in different folders, and with control +B i see them together and want to change and have them back on the same folders, how can i do that?

The plug-in can create new files in the same location as the original ones, but with a new extension. Per default, the plug-in will name them <<original filename with extension>>.txt
So if you're sure if the s&r was successful, you can delete the original files and rename the newly output files with e.g. TC's MRT tool (remove the .txt extension).

So basically you need to do:

use ctrl+B in your target dir
mark all files that you want to s&r
now: either hold the ctrl key while clicking on the Pack files button in TC's (default) button bar, or manually change the target file mask in the Pack files dialog
make sure to check the "Create separate archives, one per selected file/dir" option in the Pack files dialog
the input box in the Pack files dialog should now look like this:
Code: Select all
```
RegXtract:*.*.RegXtract
```
now open the RegXtract config ("Configure..." button)
make sure that "Search and Replace" is checked
enter your desired RegEx and replace string
you may change the option in the "Outfile extension" dropdown menu, to tell the plug-in which extension it should use (.txt is default)
close dialog, start the pack operation
done, the output files should now reside in the original file's dir(s), having the same name but added extension

LeoLUG · Post by *LeoLUG » 2020-12-04, 18:01 UTC

Thanks for the so detailed post!

mossi2000 · Post by *mossi2000 » 2024-11-03, 10:00 UTC

Hi,

I just found RegXtract. As I like regexes (Perl programmer) I was looking for a way to create a type of "summary" file for multiple input files.
e.g. search for text using a specific regex and write out a delimiting line (---------------------------------...) plus the *filename* before the matched content.
But I'm missing that <current file name> as replacement...
Am I just too silly or blind ?

Any help appreciated,
Axel

Total Commander

[WCX] RegXtract - String Extractor with RegEx

linkify

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

Re: [WCX] RegXtract - String Extractor with RegEx

[WCX] RegXtract - Possibility to have current file name as replacement / variable