Need advice on regex for searching text in files
Moderators: Hacker, petermad, Stefan2, white
Need advice on regex for searching text in files
G'day,
I have 99 Word documents which I will translate.
Some of them have numeric values joined with the units, like 60Hz instead of 60 Hz.
This will cause problems in quality check after translation, so I need to separate the units in MS Word.
I don't want to open all the Word documents, but only those with the issues.
So I need a regular expression in the "Find text:" field of the file search window (Alt+F7) for (any digit)(not a space)(units) which would allow me to find only the files where I must make changes.
In MS Word it would be [0-9][!^32]@Hz
Unfortunately, TC Help seems too complex for me, I failed to create a search string, I would greatly appreciate your help.
Thank you! :^)
I have 99 Word documents which I will translate.
Some of them have numeric values joined with the units, like 60Hz instead of 60 Hz.
This will cause problems in quality check after translation, so I need to separate the units in MS Word.
I don't want to open all the Word documents, but only those with the issues.
So I need a regular expression in the "Find text:" field of the file search window (Alt+F7) for (any digit)(not a space)(units) which would allow me to find only the files where I must make changes.
In MS Word it would be [0-9][!^32]@Hz
Unfortunately, TC Help seems too complex for me, I failed to create a search string, I would greatly appreciate your help.
Thank you! :^)
Re: Need advice on regex for searching text in files
TC Regular expressions Help:
\d a digit
\s a word separator (space, tab etc)
\S no word separator
+ one or more occurrences
So one could think: \d+\SHz would match
one-or-more digits [0-9], followed by "not an space", followed by literal "Hz"
But in real it try to match:
one-or-more digits [0-9], followed by one sign which is "not an space", followed by literal "Hz"
So it would match something like: 60xHz , not 60Hz
To match 60Hz try: \d+Hz
to match one-or-more digits [0-9], directly followed an by literal "Hz"
Or explicit \d\dHz
to match two digits [0-9], directly followed an by literal "Hz"
That?
\d a digit
\s a word separator (space, tab etc)
\S no word separator
+ one or more occurrences
So one could think: \d+\SHz would match
one-or-more digits [0-9], followed by "not an space", followed by literal "Hz"
But in real it try to match:
one-or-more digits [0-9], followed by one sign which is "not an space", followed by literal "Hz"
So it would match something like: 60xHz , not 60Hz
To match 60Hz try: \d+Hz
to match one-or-more digits [0-9], directly followed an by literal "Hz"
Or explicit \d\dHz
to match two digits [0-9], directly followed an by literal "Hz"
That?
-
- Power Member
- Posts: 872
- Joined: 2013-09-04, 14:07 UTC
Re: Need advice on regex for searching text in files
Ignore this. I was dumb...
Last edited by gdpr deleted 6 on 2021-02-03, 10:59 UTC, edited 1 time in total.
Re: Need advice on regex for searching text in files
Thanks for the tutorial, Stefan, I have been able to filter the documents according to my criteria.
Re: Need advice on regex for searching text in files
Thanks for your clarifications, elgonzo, I simply use "\dHz" or "pos.\d" it works.
-
- Power Member
- Posts: 872
- Joined: 2013-09-04, 14:07 UTC
Re: Need advice on regex for searching text in files
I guess i made mistake when suggesting my version of the regex. I failed to account for possible SI prefixes preceding the "Hz". So, stefan2' regex comes closer to what you want, although the \S used to match SI prefix should be an optional occurence, i.e.:
\d+\S?Hz
(The ? defines that the preceding symbol - the \S - will match either only once or match nothing)
\d+\S?Hz
(The ? defines that the preceding symbol - the \S - will match either only once or match nothing)
Re: Need advice on regex for searching text in files
Which pretty much illustrates the "modest" quality of TC help. It should certainly be improved, because features should be easily available to all users, not only the so-called gurus.
For your information, I wrote a macro which does find/replace in Word, it is for preparing data for translation.
The find/replace passes extensively use regex.
You can check its manual here -- https://enru.nemadeka.com/tagger.htm
It is very long, because I needed to make it very clear to anyone.
For your information, I wrote a macro which does find/replace in Word, it is for preparing data for translation.
The find/replace passes extensively use regex.
You can check its manual here -- https://enru.nemadeka.com/tagger.htm
It is very long, because I needed to make it very clear to anyone.
-
- Power Member
- Posts: 872
- Joined: 2013-09-04, 14:07 UTC
Re: Need advice on regex for searching text in files
Well, i am not sure if you are referring to my last comment. But if you do, and valid arguments for TC's help needing improvement not withstanding (we are in agreement about this, i guess), i don't understand how TC's help would be to blame for my inability to read your first post correctly...nemadeka wrote: 2021-02-03, 11:41 UTC Which pretty much illustrates the "modest" quality of TC help. It should certainly be improved, because features should be easily available to all users, not only the so-called gurus.

Re: Need advice on regex for searching text in files
Well, in the bottom of TC's help for regExp it says:
So there is all the more comprehensive help you ned, I guess...?Total Commander uses the free Delphi library TRegExpr by Andrey V. Sorokin, which is now available at https://regex.sorokin.engineer
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
Re: Need advice on regex for searching text in files
Thanks again guys, I really appreciate your help.