Bug with Multi-Rename Tool and Regex

Bug reports will be moved here when the described bug has been fixed

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Bug with Multi-Rename Tool and Regex

Post by *Balderstrom »

TC refuses more than 14 round bracket pairs in a regex match. No matter what their contents are. The Output/New name column turns into all <Error!> 's

Example Test case from another post:
SavedAs: Rx: N-\d{6} BUG
Search : ([a-zA-Z]+)-?((\d{6})0|(\d{5})(0)|(\d{4})(0)|(\d{3})(0)|(\d{2})(0)|(\d{1})(0)|(0)|(\d{7,})0)\.
Replace: $1-$3$5$4$7$7$6$9$9$9$8$11$11$11$11$10$13$13$13$13$13$12$14$14$14$14$14$14$15.
Whereas this works:
SavedAs: Rx: N-\d{6} !safe
Search : ([a-zA-Z]+)-?((\d{6})0|(\d{5})(0)|(\d{4})(0)|(\d{3})(0)|(\d{2})(0)|(\d{1})(0)|(0))\.
Replace: $1-$3$5$4$7$7$6$9$9$9$8$11$11$11$11$10$13$13$13$13$13$12$14$14$14$14$14$14.
Example #2: Error
Search: ()()()()()()()()()()()()()()()
Example #3: Works
Search: ()()()()()()()()()()()()()()


As a side note: Total Commander's implementation of Regex also doesn't support most of the "question-mark" syntax that other regex engines do, eg:
* Atomic Grouping and Possessive Quantifiers
* Lookaround
* Conditionals
From regular-expression.info: http://www.regular-expressions.info/refadv.html

An example of one particular "?" syntax that I find useful in other tools:
given the following files, foobar.txt, barfoo.txt, twofoo.txt
Search: (?|foo(bar)|bar(foo)|two(foo))
Replace: $1

Will match bar, foo and foo of the three files. Since TC doesn't support that syntax we have to do:
Search: (foo(bar)|bar(foo)|two(foo))
Replace: $2$3$4
User avatar
white
Power Member
Power Member
Posts: 4626
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: Bug with Multi-Rename Tool and Regex

Post by *white »

Balderstrom wrote:TC refuses more than 14 round bracket pairs in a regex match. No matter what their contents are. The Output/New name column turns into all <Error!> 's
Confirmed.
Balderstrom wrote: An example of one particular "?" syntax that I find useful in other tools:
given the following files, foobar.txt, barfoo.txt, twofoo.txt
Search: (?|foo(bar)|bar(foo)|two(foo))
Replace: $1
Can you point me to more information about this syntax?
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

I believe it's a subset of Regex conditionals. I couldn't find the specific page (again) where it is detailed on regular-expressions.info.
The basic idea is "(?|" is a conditional with no 'test-item' so it will attempt all matches (until one is true) as a normal "else" regex statement. The advantage is each else statement doesn't increment the backreference count.

eg: (?|regex1(subreg)|regex2(subreg))
Whether or not regex1 is matched or regex2, the backreference \1 will contain the "subreg".
I find it easier to do this than try and work-around Regex greedy mode that wants to encompass everything it can into ".*", and when you turn off greedy-mode you lose the "(item1)?" optional syntax.

Simple, AHK Example, using data/file names from above:

Code: Select all

string:="foobar.txt"
RegExMatch(string, "(?|foo(bar)|bar(foo)|two(foo))", rTmp)
MsgBox, %rTmp1%
string:="barfoo.txt"
RegExMatch(string, "(?|foo(bar)|bar(foo)|two(foo))", rTmp)
MsgBox, %rTmp1%
string:="twofoo.txt"
RegExMatch(string, "(?|foo(bar)|bar(foo)|two(foo))", rTmp)
MsgBox, %rTmp1%
User avatar
white
Power Member
Power Member
Posts: 4626
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

Balderstrom wrote:The basic idea is "(?|" is a conditional with no 'test-item' so it will attempt all matches (until one is true) as a normal "else" regex statement. The advantage is each else statement doesn't increment the backreference count.
This does not seem to be the case. The normal "else" regex statement does seem to increment the backreference count. Moreover if you want to use alternation in the else part you will have to group the else together using parentheses (ref).

It seems to be a construct of its own. I found a reference here:

Code: Select all

http://perldoc.perl.org/perlre.html#(%3f%7cpattern)
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48096
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

This is a limitation in the code of the RegEx library used by TC:
NSUBEXP = 15; // max number of subexpression //###0.929

I don't know why there is such a limit, but it's quite probable that this value could be increased in code without problems. There is a comment in the code too:

// Cannot be more than NSUBEXPMAX
// Be carefull - don't use values which overflow CLOSE opcode
// (in this case you'll get compiler erorr).
// Big NSUBEXP will cause more slow work and more stack required
NSUBEXPMAX = 255; // Max possible value for NSUBEXP. //###0.945
Author of Total Commander
https://www.ghisler.com
isidro
Junior Member
Junior Member
Posts: 96
Joined: 2006-03-21, 04:39 UTC
Location: argentina

Post by *isidro »

ghisler(Author) wrote:This is a limitation in the code of the RegEx library used by TC:
NSUBEXP = 15; // max number of subexpression //###0.929

I don't know why there is such a limit, but it's quite probable that this value could be increased in code without problems.
It would be really useful if you could increase NSUBEXP to 64 or 80, it would save me a lot of time doing 1/3 of my searches. Thanks!
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48096
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I'm a bit worried about this:
Big NSUBEXP will cause more slow work and more stack required
Author of Total Commander
https://www.ghisler.com
isidro
Junior Member
Junior Member
Posts: 96
Joined: 2006-03-21, 04:39 UTC
Location: argentina

Post by *isidro »

ghisler(Author) wrote:I'm a bit worried about this:
Big NSUBEXP will cause more slow work and more stack required
I don't think it will be noticeble (specially in today's hardware), perhaps using a conservative value like 32 won't need so many resources and most of my searches are within that range but outside current value.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48096
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Sounds reasonable, I will increase it in the next version.
Author of Total Commander
https://www.ghisler.com
isidro
Junior Member
Junior Member
Posts: 96
Joined: 2006-03-21, 04:39 UTC
Location: argentina

Post by *isidro »

ghisler(Author) wrote:Sounds reasonable, I will increase it in the next version.
Thanks! :)
User avatar
white
Power Member
Power Member
Posts: 4626
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

HISTORY.TXT wrote:02.12.12 Added: Regular expressions: Increased number of sub-expressions (NSUBEXP) from 15 to 32 (32/64)
Tested OK using TC 8.50b2 32bit.
Post Reply