Regular Expressions

From TotalcmdWiki
Jump to navigation Jump to search

Regular expressions

Regular expressions are a very powerful search tool. They allow to search for complex classes of words. Regular expressions are mainly meant for professionals, but can also be useful in the office for finding certain documents (see examples below).

Total Commander supports regular expressions in the following functions: - Commands - Search (in file name and file contents) - In Lister - In the Multi-Rename tool - In the selection dialog

Regular expressions consist of normal characters and special characters, so-called meta-characters. The following characters are meta-characters or initial parts of meta-characters:

.  \  (  )  [  ]  {  }  ^  $  +  *  ?    (only in character classes: - )

Normal characters:

test finds the string "test" in the searched text. Note: This finds "test" ANYWHERE in a file name or on a line in text.

Escape sequences:

A backslash \ starts an Escape sequence. Examples for escape sequences:

\t Tabstop
\xnn Character with hexadecimal code nn. Example: \x20 is the space character. The character table charmap.exe (if installed) shows the character code of most special characters. You can use the Windows calculator in scientific mode to convert from decimal to hex.
\[ Left square bracket. Since the square brackets are meta-characters, they need to be written as \[ to search for them in the target string.
\\ Finds a backslash.
\. Finds a dot ("." alone finds any character, see below).


Character classes

Characters in square brackets build a character class. It will find exacly one character from this class. A dash allows to define groups, e.g. [a-z]. A ^ at the beginning finds all characters except for those listed.

Examples:

[aeiou] Finds exactly one of the listed vovels.
[^aeiou] Finds everything except for a vovel.
M[ae][iy]er Finds a Mr. Meier in all possible ways of writing: Mayer, Meyer, Maier, Meier. Very useful if you cannot remember the exact writing of a name.

Meta-characters

Here is a list of the most important meta-characters:

^ Line start
$ Line end
. Any character
\w a letter, digit or underscore _
\W the opposite of \w
\d a digit
\D no digit
\s a word separator (space, tab etc)
\S no word separator
\b finds a word boundary (combination of \s and \S)
\B the opposite of \b


Iterators

Iterators are used for a repetition of the character or expression to the left of the iterator.

* zero or more occurances
+ one or more occurances
{n} exactly n occurances
{n,} at least n occurances
{n,m} at least n and max. m occurances

All these operators are "greedy", which means that they take as many characters as they can get. Putting a question mark ? after an operator makes it "non-greedy", i.e. it takes only as many characters as needed. Example: "b+" applied to the target string "abbbbc" finds "bbbb", "b+?" finds just "b".


Alternatives

Alternatives are put in round braces, and are separated by a vertical dash.

Example:

(John|James|Peter) finds one of the names John, James or Peter.


Subexpressions for search+replace

Text parts in round braces are taken as subexpressions.

Example:

To swap the title and interpret in the file name of an mp3 file, when they are separated by a dash (Title - Interpret.mp3), this can be solved like this:

Search for: (.*) - (.*)\.mp3

Replace by: $2 - $1.mp3

Here $1 means the text in the first brace, and $2 the text in the second brace.

Backreferences

\n Finds subexpression n another time in the search result.

Example:

(.+)\1+  finds e.g.  abab  (where the first ab is found by .+  and the second by \1+ )

Modifiers

Modifiers are used for changing behaviour of regular expressions.

(?i) Ignore Upper-/lowercase. In Total Commander, this is the default for file names.
(?-i) Case-sensitive matching.
(?g) Switches on "greedy" mode (active by default)
(?-g) Turns off "greedy" mode, so "+" means the same as "+?"

The other modificators are not relevant for Total Commander, because the program only supports searching within one line.


Total Commander uses the free Delphi library TRegExpr by Andrey V. Sorokin: http://regexpstudio.com/ Some of the above explanations are from the help file for this library.


Using Regular expressions

Using Regular Expressions (RegEx) is not so easy if you are not familiar with it. Thus here a little more information:

One of the most puzzling things with RegEx is that the Asterisk *, the Questionpoint ? and the dot . do not have the same meaning as you know.

As far as you put on the 'RegEx' switch the * does not mean 'any textstring with any length' but it is a counter that means 'the preceeding expression should be there zero, one or more times'. So if you want to express the 'common' meaning of the asterisk you'll have to use:

.* 

where the dot is a placeholder for any character and the asterisk means zero, one or more occurances of 'any character'.

If you want to see how TC treats those terms it is a good idea to open the Multi-Rename Tool with a few files selected, check the 'RegEx' box and try the examples here. The Multi-Rename tool allows you a preview what it would do if you start the renaming with the current settings.

So if you leave the file name: [N] and the extension [E] the

search for: .* 
Replace with:<Clear>

would remove the whole name and extension. MRT2.png

while

search for: .*\.
Replace with:<Clear>

would remove the whole name including the dot and leave the extension as name.

MRT3.png

In this case the . is not meant as 'any char but as normal dot because of the preceeding \.


Okay, let's look at some real questions from the Forum :

Example 1

I have a list of musik-files wich are named: 
Name, first_name - CDTitle.mp3   but I like to get them
firstname name - CDTitle.mp3

search for:(.*), (.*) - (.*)

replace with:$2 $1 - $3

Using round brackets induces TC to take the term as Subexpression so you can refer to it.

Thus you search for a textstring of any lenght, followed by a colon and a space:(.*),

then followed ba another string of any length followed by space minus space:(.*) -

and followed by a third textstring of any length:(.*)

and replace with the second string followed by a space:$2

then the first one followed by space minus space:$1 -

and then the last one:$3

which also contains the extension because the dot between name and extension is also one of the 'any char'.

You could put the 'space minus space' sequence also into one of the Subexpressions if you like:

search for:(.*), (.* - )(.*)

replace with:$2 $1$3

or

search for:(.*), (.*)( - .*)

replace with:$2 $1$3

It will be the same result.


Back to List of internal functions