Search-function, glob vs. substring

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Search-function, glob vs. substring

Post by *georgeb »

I've stumbled upon a strange phenomenon concerning the FindFiles-function of TC which at first glance would pretty much look like a bug - but is it?

In a clients-database/directory there are dozens of subfolders all structured by names in the format "F(irst)_(Sur)Name", many of them being German so they might also contain Umlauts (ä, ö, ü), sometimes also written as (ae, oe, ue) in a mixed manner.

To make sure to find and select all "Müller"-s/"Mueller"-s I did a simple substring-search for "m*ller" (without the quotes) with the attribute "Directory" set in the Advanced-Tab.

To my great surprise FileFind came up with only one single hit "M_Mueller" but missed out on, e.g. "I_Mueller" and others which I knew for sure to be there.

When I followed that up with a comprehensive glob-search "*m*ller*.*" I got all 17 hits instead, including e.g. "V_Moeller" and "B_Miller", quite as initially expected.

So did I get the concept of a simple substring search somehow wrong or is there a bug at work? On the other hand a bug as fundamental as that would have been detected during the beta-test, wouldn't it. So any clarification would be appreciated because my hitherto firm trust into FileFind has been somewhat undermined since that experience.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Search-function, glob vs. substring

Post by *Hacker »

georgeb,
Help wrote:w*.*|*.bak *.old finds files, which start with w and do not end with .bak or .old.
So m*ller finds files / folders which start with m and end with ller.

HTH
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
User avatar
Dalai
Power Member
Power Member
Posts: 9364
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Search-function, glob vs. substring

Post by *Dalai »

When using a single term like "foobar" TC actually performs a search for

Code: Select all

*foobar*
. As soon as there's a wildcard in the search term, TC doesn't "interpret" the search term more freely anymore. In your case,

Code: Select all

m*ller
searches for files starting with m und ending with "ller" with anything in between.

If you actually want to find the term anywhere in the names, you have to add more wildcards yourself -

Code: Select all

*m*ller*
just as you already figured out. This allows a more granular control over what to search for, e.g. you could use

Code: Select all

m*ller*
to find names starting with "m", containing "ller" somewhere after that and ending with anything.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Fla$her
Power Member
Power Member
Posts: 2244
Joined: 2020-01-18, 04:03 UTC

Re: Search-function, glob vs. substring

Post by *Fla$her »

2georgeb
I don't see a problem here. In the logic of the mask, I_Mueller should not be find by m*ller, because the search in names in the presence of wildcards is conducted along the entire length of the name, and not in any part of it. Do not confuse it with searching for text in files. * in front is required here.
Overquoting is evil! 👎
User avatar
white
Power Member
Power Member
Posts: 4594
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: Search-function, glob vs. substring

Post by *white »

Confusion about Total Commander's Search for field is understandable. It's quite complex and the help text is not very clear. Here is my understanding of the feature.

There are 2 search modes:

Condition for search mode Description of search mode
The Search for field does NOT contain a dot, wildcard or backslash Find filenames where part of the filename matches the entire Search for field.

Notes:
  • Spaces and the pipe symbol have no special meaning and are interpreted literally.

    So be careful with the number of spaces and leading and trailing spaces.
    Searching for "test|crap" will not find a file named "test".
  • A double quote character at the beginning and an optional matching quote character are ignored.

    So the following search terms have the same results:
    Total Commander
    "Total Commander"
    "Total Com"mander

The Search for field DOES contain a dot, wildcard or backslash Find filenames where
⠀(a) the whole filename matches one of the expressions in the Search for field
or
⠀(b) the whole filename matches the entire Search for field (no character is interpreted as separator character and the entire Search for field is seen as 1 expression).

Notes:
  • The space character is a separator character except when inside a quoted text.
  • The backslash character and the pipe character have special meaning as explained in Help.
  • When the Search for field contains a backslash or the pipe character, (b) is not done or gives no results.

georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Search-function, glob vs. substring

Post by *georgeb »

Hacker wrote: 2023-04-01, 13:24 UTC
Help wrote:w*.*|*.bak *.old finds files, which start with w and do not end with .bak or .old.
So m*ller finds files / folders which start with m and end with ller.
I'm afraid it's not that simple. And I don't see how the above example excluding .bak and .old files by means of the pipe-Character would help to explain the phenomenon here.

In fact - and in all modesty - I consider myself to be quite a veteran when it comes to searching. So I'm quite familiar with the two major opposing search-concepts sometimes conflicting with each other when forced into the same entry-mask in a - from time to time - somewhat unfortunate manner.

First there is the (fully systematic) glob-search and secondly there is also the (a bit more "quick and dirty") substring search. In your help-example above a "dot" is contained within the search-mask-entry. And I'm quite certain of the presence of a "dot" being the primary character to force the whole thing into a glob-search.

Now with the full glob-syntax there are no problems, it turns out all results as expected. It is the substring search that delivers questionable results. And the phenomenon burns down to 2 questions:

1. I've been under the - presumably wrong - impression that TC meanwhile would offer a feature commonly known as "multiple substring search" with the "*"-wildcard separating those substrings without making the whole thing a glob-search. So is there currently a "multiple substring search"-feature in TC or not?

2. And if there is no such feature and "*" - or any wildcard for that matter - forces this into a glob-search - then why on earth "m*ller" would return [M_Mueller] BUT NOT e.g. [I_Mueller]?
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Search-function, glob vs. substring

Post by *georgeb »

Dalai wrote: 2023-04-01, 13:31 UTCAs soon as there's a wildcard in the search term, TC doesn't "interpret" the search term more freely anymore. In your case,

Code: Select all

m*ller
searches for files starting with m und ending with "ller" with anything in between.
But in the first case of only one single search-term (which results in a substring search) this substring is found anywhere within an expression/name, the name doesn't have to begin with 'foobar' in your example, abcfoobarxyz will be found as well.

Are you sure EACH wildcard (and not only a "dot") forces a glob-search?

And if so - why then [M_Mueller] is found at all - BUT NOT [I_Mueller] which has an "M" in front of "ller", too, as the "*"-wildcard AFAIK stands for 1 or more (n)-characters in between? And in the [I_Mueller]-case there is the "M" and then we have 2 characters (ue) before "ller"?
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Search-function, glob vs. substring

Post by *georgeb »

Fla$her wrote: 2023-04-01, 13:43 UTC 2georgeb
I don't see a problem here. In the logic of the mask, I_Mueller should not be find by m*ller, because the search in names in the presence of wildcards is conducted along the entire length of the name, and not in any part of it. Do not confuse it with searching for text in files. * in front is required here.
You are probably right and the interpretation of "wildcard" is the answer to my problem. I went under the false impression that only "dot" would force a glob-search while "*" could be used to concatenate multiple substrings still keeping the whole thing a substring search. And once forced by "*" into a glob-search [M_Mueller] is found as it starts with an "M", too, while [I_Mueller] has the leading "M" before "ller" as well - but both occur somewhere in the middle of the name.
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Search-function, glob vs. substring

Post by *Usher »

2georgeb
Wildcard "*" stands for one or more characters, for example M_Mueller or M_Voeller
Andrzej P. Wozniak
Polish subforum moderator
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Search-function, glob vs. substring

Post by *georgeb »

white wrote: 2023-04-01, 20:10 UTC Confusion about Total Commander's Search for field is understandable. It's quite complex and the help text is not very clear. Here is my understanding of the feature.

There are 2 search modes:
Thanks a lot for that clarification. While I was well aware of the 2 search-modes (glob vs. substring) I was under the (wrong) impression that only a "dot", backslash or pipe would break the substring search while wildcards could meanwhile be used to concatenate multiple substrings while keeping the whole operation still a substring-search.

This is exactly why I've (together with others) suggested long ago to DISENTANGLE this "Search for"-mask into one entry-line for (then also multiple) substring-searches with wildcards and one dedicated input line for systematic glob-searches.

This proposal should also eliminate the IMHO somewhat unfortunate status-quo that directory-exclusions (or specific target-directory-entries for that matter) have to be currently specified within the search-for-field of the search-mask - as opposed to the search-in-field where they would obviously belong from a logical point of view.

IMHO this whole File-Find input mask, as it currently stands, is the result of some kind of "organic growth" of TC-features and over time increased capabilities as a whole and could well use some "streamlining" and "disentanglement" to finally do justice to the mature-premium-utility beyond v.10 which TC without a doubt epitomizes.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Search-function, glob vs. substring

Post by *georgeb »

Usher wrote: 2023-04-02, 01:09 UTCWildcard "*" stands for one or more characters, for example M_Mueller or M_Voeller
Yes, but why then not for I_Mueller, which also contains an "M" in front, then two arbitrary characters (in this case "ue") and then "ller"? That was my initial irritation. But the answer is by forcing this into a glob-search altogether by use of a wildcard ("*") in [I_Mueller] the "M" is no longer the first sign by which the whole name begins.

But while we're at it and to be utterly precise - does the wildcard "*" stand for one or more characters OR RATHER zero or more characters? In other words - should "m*ller" also find [Mller] (without any vowel in between, whatever that means) - or not?
User avatar
Dalai
Power Member
Power Member
Posts: 9364
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Search-function, glob vs. substring

Post by *Dalai »

georgeb wrote: 2023-04-02, 00:49 UTCAnd if so - why then [M_Mueller] is found at all - BUT NOT [I_Mueller] which has an "M" in front of "ller" [...]
Although the latter contains "m" followed by "ller" it doesn't start with m, as explained above. TC doesn't implicitely add wildcards by itself when the user already did.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Search-function, glob vs. substring

Post by *georgeb »

Dalai wrote: 2023-04-02, 02:25 UTCAlthough the latter contains "m" followed by "ller" it doesn't start with m, as explained above. TC doesn't implicitely add wildcards by itself when the user already did.
Thanks, @Dalai for pointing this out again. This is now understood. With the help of all these qualified answers I've meanwhile been able to figure this out myself. So no "erratic behavior" perceived on my side any longer.

Yet still IMHO this incidence once more emphasizes the value of the aforementioned proposal of earlier times by others and myself to DISENTANGLE the search-input-mask into 2 dedicated lines each, one for exact glob-search and one for enhanced (multiple) substring-search then allowing wildcards ("*","?") to concatenate multiple substrings without breaking the whole substring-search altogether and forcing it into a glob-search instead.
HalbschuhTouri
Junior Member
Junior Member
Posts: 61
Joined: 2023-01-20, 09:33 UTC

Re: Search-function, glob vs. substring

Post by *HalbschuhTouri »

georgeb wrote: 2023-04-02, 02:49 UTC ... proposal of earlier times by others and myself to DISENTANGLE the search-input-mask into 2 dedicated lines each, one for exact glob-search and one for enhanced (multiple) substring-search then allowing wildcards ("*","?") to concatenate multiple substrings without breaking the whole substring-search altogether and forcing it into a glob-search instead.
I also think it would be a good idea to re-work the interface of the FileFind procedure in order to clarify/streamline where exactly what to input to either yield a "glob"- or subsring-search and while we're at it thereby also enabling a "multiple substring search" by use of the usual wildcards.
algol
Senior Member
Senior Member
Posts: 448
Joined: 2007-07-31, 14:45 UTC

Re: Search-function, glob vs. substring

Post by *algol »

Let me also express my vehement support for uncoupling ("disentangle") the search-mask in the FileFind-tool, "Search for"-section, into two dedicated separate lines, one for substring-search and one for glob-search each, thereby finally enabling a genuine, simple "multiple substring search" without having to resort to a workaround via "Everything" or RegEx.

Should there be concerns about changing a longstanding (mis-?)-behavior in TC then splitting the search-mask into separate lines for substring and glob could be made available as an advanced option only via configuration or via a simple checkbox.

But for all the "power-searchers" out there the advantages of such a step would quickly become obvious. For instance wildcards or the entry of multiple search-terms would then become possible in a more intuitive (in particular for average users) substring-search as well without instantly forcing a more formalistic glob-search by the mere entry of any of those.
Post Reply