Regular Expressions - usefull tips

ford prefect · Post by *ford prefect » 2004-11-10, 18:02 UTC

Helloo everyone,

I just stumbled over a beautiful but not commonly known regexp search solution. Excuse my stupidity, but I felt like a thread about common regexp solutions would encourage users to use them

And maybe others to contribute with their special tricks with regexp too...

in many cases you will be discouraged to use the so called "greedy" quantifiers
+ - one or more occurrences
* - zero or more occurrences

example:
you want to search for any opening link HTML tag <a href=link>
The common problem is that you shouldn't write:

<a href=.*>

as it would select the entire text from the first <a href= till the last > character in the entire file.
Many handbooks advise to use some ugly method like

<a href=[^<]*>

where the [^<]* means search zero or more occurrences of any character except < which is sooo ugly

But hey! We have LAZY quantifiers

Code: Select all

Greedy     Lazy

  *         *?
  +         +?
 {n,}      {n,}?

with which you can do it in a snap like this:

<a href=.*?>

Naturally the dot '.' is any character but since you use *? the search will stop by the first occurence of the > character

TRY IT, YOU WON'T REGRET IT!!!

_____________________________________________

...and in that fashion it can get complex:
lets say you want to search for any sequence of tags for HTML table cell

<td any parameters>anything<a href=anything>anything</a>anything</td>

you don't have to perform magic for this one with the lazy quantifiers,
just remember that instead of * (as in non-regexp filters, searches) you should use .*?

so you can imagine it with the 'stars' first just like

<td*>*<a href=*>*</a>*</td>

the seqence in regexp is simple then

<td.*?>.*?<a href=.*?>.*?</a>.*?</td>

Nice!

I've got this from a genious chm help file called SAMS - Teach Yourself Regular Expressions in 10 Minutes - 2004.

Post by *petermad » 2004-11-10, 18:16 UTC

Sounds very interesting - and where can one get hold of SAMS

ford prefect · Post by *ford prefect » 2004-11-10, 18:33 UTC

Well, I shouldn't be mentioning but SAMS - Teach Yourself Regular Expressions in 10 Minutes is actually a good old fasioned paper book
http://www.samspublishing.com/title/0672325667
and the "electronical" version I've got is... let's just say... not offical.

And for the record, Since I always "DO NOT" look in the TC manual the LAZY or as written there the non-greedy operators are mentioned there too, I just didn't notice for the first time when I studied it.

So this is also a nice example that you should ALWAYS look in the manual

yxz11 · Post by *yxz11 » 2004-11-15, 02:59 UTC

Here is a very nice tutorial for RE:

http://www.regular-expressions.info/tutorial.html

The syntax is a bit different than what TC used, above tutorial talks about PCRE. I think this is the library TC used and here is the help:

http://regexpstudio.com/TRegExpr/Help/RegExp_Syntax.html

And here is another site have information about RE:

http://www.regularexpression.info/

To master RE, Jeffrey Friedl's book is a good read:

http://www.oreilly.com/catalog/regex2/

Spending time on learning RE is one of my best investment.