Regular Expressions - usefull tips

English support forum

Moderators: Hacker, petermad, Stefan2, white

Post Reply
User avatar
ford prefect
Junior Member
Junior Member
Posts: 39
Joined: 2003-02-06, 18:03 UTC
Location: Earth
Contact:

Regular Expressions - usefull tips

Post by *ford prefect »

Helloo everyone,

I just stumbled over a beautiful but not commonly known regexp search solution. Excuse my stupidity, but I felt like a thread about common regexp solutions would encourage users to use them :) And maybe others to contribute with their special tricks with regexp too...

in many cases you will be discouraged to use the so called "greedy" quantifiers
+ - one or more occurrences
* - zero or more occurrences

example:
you want to search for any opening link HTML tag <a href=link>
The common problem is that you shouldn't write:

<a href=.*>

as it would select the entire text from the first <a href= till the last > character in the entire file.
Many handbooks advise to use some ugly method like

<a href=[^<]*>

where the [^<]* means search zero or more occurrences of any character except < which is sooo ugly :(

But hey! We have LAZY quantifiers

Code: Select all

Greedy     Lazy

  *         *?
  +         +?
 {n,}      {n,}?
 

with which you can do it in a snap like this:

<a href=.*?>

Naturally the dot '.' is any character but since you use *? the search will stop by the first occurence of the > character

TRY IT, YOU WON'T REGRET IT!!! :wink:

_____________________________________________

...and in that fashion it can get complex:
lets say you want to search for any sequence of tags for HTML table cell

<td any parameters>anything<a href=anything>anything</a>anything</td>

you don't have to perform magic for this one with the lazy quantifiers,
just remember that instead of * (as in non-regexp filters, searches) you should use .*?

so you can imagine it with the 'stars' first just like

<td*>*<a href=*>*</a>*</td>

the seqence in regexp is simple then

<td.*?>.*?<a href=.*?>.*?</a>.*?</td>

Nice!

I've got this from a genious chm help file called SAMS - Teach Yourself Regular Expressions in 10 Minutes - 2004.
Last edited by ford prefect on 2004-11-10, 18:24 UTC, edited 1 time in total.
Don't Panic! We have Total Commander...
User avatar
petermad
Power Member
Power Member
Posts: 16032
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Post by *petermad »

Sounds very interesting - and where can one get hold of SAMS :?:
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
ford prefect
Junior Member
Junior Member
Posts: 39
Joined: 2003-02-06, 18:03 UTC
Location: Earth
Contact:

Post by *ford prefect »

Well, I shouldn't be mentioning but SAMS - Teach Yourself Regular Expressions in 10 Minutes is actually a good old fasioned paper book
http://www.samspublishing.com/title/0672325667
and the "electronical" version I've got is... let's just say... not offical.

And for the record, Since I always "DO NOT" look in the TC manual the LAZY or as written there the non-greedy operators are mentioned there too, I just didn't notice for the first time when I studied it. :oops:

So this is also a nice example that you should ALWAYS look in the manual :)
Don't Panic! We have Total Commander...
yxz11
Junior Member
Junior Member
Posts: 18
Joined: 2003-11-20, 00:45 UTC

Post by *yxz11 »

Here is a very nice tutorial for RE:

http://www.regular-expressions.info/tutorial.html

The syntax is a bit different than what TC used, above tutorial talks about PCRE. I think this is the library TC used and here is the help:

http://regexpstudio.com/TRegExpr/Help/RegExp_Syntax.html

And here is another site have information about RE:

http://www.regularexpression.info/

To master RE, Jeffrey Friedl's book is a good read:

http://www.oreilly.com/catalog/regex2/

Spending time on learning RE is one of my best investment.
Post Reply