Search-function, glob vs. substring
Moderators: Hacker, petermad, Stefan2, white
Search-function, glob vs. substring
I've stumbled upon a strange phenomenon concerning the FindFiles-function of TC which at first glance would pretty much look like a bug - but is it?
In a clients-database/directory there are dozens of subfolders all structured by names in the format "F(irst)_(Sur)Name", many of them being German so they might also contain Umlauts (ä, ö, ü), sometimes also written as (ae, oe, ue) in a mixed manner.
To make sure to find and select all "Müller"-s/"Mueller"-s I did a simple substring-search for "m*ller" (without the quotes) with the attribute "Directory" set in the Advanced-Tab.
To my great surprise FileFind came up with only one single hit "M_Mueller" but missed out on, e.g. "I_Mueller" and others which I knew for sure to be there.
When I followed that up with a comprehensive glob-search "*m*ller*.*" I got all 17 hits instead, including e.g. "V_Moeller" and "B_Miller", quite as initially expected.
So did I get the concept of a simple substring search somehow wrong or is there a bug at work? On the other hand a bug as fundamental as that would have been detected during the beta-test, wouldn't it. So any clarification would be appreciated because my hitherto firm trust into FileFind has been somewhat undermined since that experience.
In a clients-database/directory there are dozens of subfolders all structured by names in the format "F(irst)_(Sur)Name", many of them being German so they might also contain Umlauts (ä, ö, ü), sometimes also written as (ae, oe, ue) in a mixed manner.
To make sure to find and select all "Müller"-s/"Mueller"-s I did a simple substring-search for "m*ller" (without the quotes) with the attribute "Directory" set in the Advanced-Tab.
To my great surprise FileFind came up with only one single hit "M_Mueller" but missed out on, e.g. "I_Mueller" and others which I knew for sure to be there.
When I followed that up with a comprehensive glob-search "*m*ller*.*" I got all 17 hits instead, including e.g. "V_Moeller" and "B_Miller", quite as initially expected.
So did I get the concept of a simple substring search somehow wrong or is there a bug at work? On the other hand a bug as fundamental as that would have been detected during the beta-test, wouldn't it. So any clarification would be appreciated because my hitherto firm trust into FileFind has been somewhat undermined since that experience.
Re: Search-function, glob vs. substring
georgeb,
HTH
Roman
So m*ller finds files / folders which start with m and end with ller.Help wrote:w*.*|*.bak *.old finds files, which start with w and do not end with .bak or .old.
HTH
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
Re: Search-function, glob vs. substring
When using a single term like "foobar" TC actually performs a search for . As soon as there's a wildcard in the search term, TC doesn't "interpret" the search term more freely anymore. In your case, searches for files starting with m und ending with "ller" with anything in between.
If you actually want to find the term anywhere in the names, you have to add more wildcards yourself - just as you already figured out. This allows a more granular control over what to search for, e.g. you could use to find names starting with "m", containing "ller" somewhere after that and ending with anything.
Regards
Dalai
Code: Select all
*foobar*
Code: Select all
m*ller
If you actually want to find the term anywhere in the names, you have to add more wildcards yourself -
Code: Select all
*m*ller*
Code: Select all
m*ller*
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: Search-function, glob vs. substring
2georgeb
I don't see a problem here. In the logic of the mask, I_Mueller should not be find by m*ller, because the search in names in the presence of wildcards is conducted along the entire length of the name, and not in any part of it. Do not confuse it with searching for text in files. * in front is required here.
I don't see a problem here. In the logic of the mask, I_Mueller should not be find by m*ller, because the search in names in the presence of wildcards is conducted along the entire length of the name, and not in any part of it. Do not confuse it with searching for text in files. * in front is required here.
Overquoting is evil! 👎
Re: Search-function, glob vs. substring
Confusion about Total Commander's Search for field is understandable. It's quite complex and the help text is not very clear. Here is my understanding of the feature.
There are 2 search modes:
There are 2 search modes:
Condition for search mode | Description of search mode |
The Search for field does NOT contain a dot, wildcard or backslash | Find filenames where part of the filename matches the entire Search for field. Notes:
|
The Search for field DOES contain a dot, wildcard or backslash | Find filenames where ⠀(a) the whole filename matches one of the expressions in the Search for field or ⠀(b) the whole filename matches the entire Search for field (no character is interpreted as separator character and the entire Search for field is seen as 1 expression). Notes:
|
Re: Search-function, glob vs. substring
I'm afraid it's not that simple. And I don't see how the above example excluding .bak and .old files by means of the pipe-Character would help to explain the phenomenon here.Hacker wrote: 2023-04-01, 13:24 UTCSo m*ller finds files / folders which start with m and end with ller.Help wrote:w*.*|*.bak *.old finds files, which start with w and do not end with .bak or .old.
In fact - and in all modesty - I consider myself to be quite a veteran when it comes to searching. So I'm quite familiar with the two major opposing search-concepts sometimes conflicting with each other when forced into the same entry-mask in a - from time to time - somewhat unfortunate manner.
First there is the (fully systematic) glob-search and secondly there is also the (a bit more "quick and dirty") substring search. In your help-example above a "dot" is contained within the search-mask-entry. And I'm quite certain of the presence of a "dot" being the primary character to force the whole thing into a glob-search.
Now with the full glob-syntax there are no problems, it turns out all results as expected. It is the substring search that delivers questionable results. And the phenomenon burns down to 2 questions:
1. I've been under the - presumably wrong - impression that TC meanwhile would offer a feature commonly known as "multiple substring search" with the "*"-wildcard separating those substrings without making the whole thing a glob-search. So is there currently a "multiple substring search"-feature in TC or not?
2. And if there is no such feature and "*" - or any wildcard for that matter - forces this into a glob-search - then why on earth "m*ller" would return [M_Mueller] BUT NOT e.g. [I_Mueller]?
Re: Search-function, glob vs. substring
But in the first case of only one single search-term (which results in a substring search) this substring is found anywhere within an expression/name, the name doesn't have to begin with 'foobar' in your example, abcfoobarxyz will be found as well.Dalai wrote: 2023-04-01, 13:31 UTCAs soon as there's a wildcard in the search term, TC doesn't "interpret" the search term more freely anymore. In your case,searches for files starting with m und ending with "ller" with anything in between.Code: Select all
m*ller
Are you sure EACH wildcard (and not only a "dot") forces a glob-search?
And if so - why then [M_Mueller] is found at all - BUT NOT [I_Mueller] which has an "M" in front of "ller", too, as the "*"-wildcard AFAIK stands for 1 or more (n)-characters in between? And in the [I_Mueller]-case there is the "M" and then we have 2 characters (ue) before "ller"?
Re: Search-function, glob vs. substring
You are probably right and the interpretation of "wildcard" is the answer to my problem. I went under the false impression that only "dot" would force a glob-search while "*" could be used to concatenate multiple substrings still keeping the whole thing a substring search. And once forced by "*" into a glob-search [M_Mueller] is found as it starts with an "M", too, while [I_Mueller] has the leading "M" before "ller" as well - but both occur somewhere in the middle of the name.Fla$her wrote: 2023-04-01, 13:43 UTC 2georgeb
I don't see a problem here. In the logic of the mask, I_Mueller should not be find by m*ller, because the search in names in the presence of wildcards is conducted along the entire length of the name, and not in any part of it. Do not confuse it with searching for text in files. * in front is required here.
Re: Search-function, glob vs. substring
2georgeb
Wildcard "*" stands for one or more characters, for example M_Mueller or M_Voeller
Wildcard "*" stands for one or more characters, for example M_Mueller or M_Voeller
Andrzej P. Wozniak
Polish subforum moderator
Polish subforum moderator
Re: Search-function, glob vs. substring
Thanks a lot for that clarification. While I was well aware of the 2 search-modes (glob vs. substring) I was under the (wrong) impression that only a "dot", backslash or pipe would break the substring search while wildcards could meanwhile be used to concatenate multiple substrings while keeping the whole operation still a substring-search.white wrote: 2023-04-01, 20:10 UTC Confusion about Total Commander's Search for field is understandable. It's quite complex and the help text is not very clear. Here is my understanding of the feature.
There are 2 search modes:
This is exactly why I've (together with others) suggested long ago to DISENTANGLE this "Search for"-mask into one entry-line for (then also multiple) substring-searches with wildcards and one dedicated input line for systematic glob-searches.
This proposal should also eliminate the IMHO somewhat unfortunate status-quo that directory-exclusions (or specific target-directory-entries for that matter) have to be currently specified within the search-for-field of the search-mask - as opposed to the search-in-field where they would obviously belong from a logical point of view.
IMHO this whole File-Find input mask, as it currently stands, is the result of some kind of "organic growth" of TC-features and over time increased capabilities as a whole and could well use some "streamlining" and "disentanglement" to finally do justice to the mature-premium-utility beyond v.10 which TC without a doubt epitomizes.
Re: Search-function, glob vs. substring
Yes, but why then not for I_Mueller, which also contains an "M" in front, then two arbitrary characters (in this case "ue") and then "ller"? That was my initial irritation. But the answer is by forcing this into a glob-search altogether by use of a wildcard ("*") in [I_Mueller] the "M" is no longer the first sign by which the whole name begins.Usher wrote: 2023-04-02, 01:09 UTCWildcard "*" stands for one or more characters, for example M_Mueller or M_Voeller
But while we're at it and to be utterly precise - does the wildcard "*" stand for one or more characters OR RATHER zero or more characters? In other words - should "m*ller" also find [Mller] (without any vowel in between, whatever that means) - or not?
Re: Search-function, glob vs. substring
Although the latter contains "m" followed by "ller" it doesn't start with m, as explained above. TC doesn't implicitely add wildcards by itself when the user already did.georgeb wrote: 2023-04-02, 00:49 UTCAnd if so - why then [M_Mueller] is found at all - BUT NOT [I_Mueller] which has an "M" in front of "ller" [...]
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: Search-function, glob vs. substring
Thanks, @Dalai for pointing this out again. This is now understood. With the help of all these qualified answers I've meanwhile been able to figure this out myself. So no "erratic behavior" perceived on my side any longer.Dalai wrote: 2023-04-02, 02:25 UTCAlthough the latter contains "m" followed by "ller" it doesn't start with m, as explained above. TC doesn't implicitely add wildcards by itself when the user already did.
Yet still IMHO this incidence once more emphasizes the value of the aforementioned proposal of earlier times by others and myself to DISENTANGLE the search-input-mask into 2 dedicated lines each, one for exact glob-search and one for enhanced (multiple) substring-search then allowing wildcards ("*","?") to concatenate multiple substrings without breaking the whole substring-search altogether and forcing it into a glob-search instead.
-
- Junior Member
- Posts: 64
- Joined: 2023-01-20, 09:33 UTC
Re: Search-function, glob vs. substring
I also think it would be a good idea to re-work the interface of the FileFind procedure in order to clarify/streamline where exactly what to input to either yield a "glob"- or subsring-search and while we're at it thereby also enabling a "multiple substring search" by use of the usual wildcards.georgeb wrote: 2023-04-02, 02:49 UTC ... proposal of earlier times by others and myself to DISENTANGLE the search-input-mask into 2 dedicated lines each, one for exact glob-search and one for enhanced (multiple) substring-search then allowing wildcards ("*","?") to concatenate multiple substrings without breaking the whole substring-search altogether and forcing it into a glob-search instead.
Re: Search-function, glob vs. substring
Let me also express my vehement support for uncoupling ("disentangle") the search-mask in the FileFind-tool, "Search for"-section, into two dedicated separate lines, one for substring-search and one for glob-search each, thereby finally enabling a genuine, simple "multiple substring search" without having to resort to a workaround via "Everything" or RegEx.
Should there be concerns about changing a longstanding (mis-?)-behavior in TC then splitting the search-mask into separate lines for substring and glob could be made available as an advanced option only via configuration or via a simple checkbox.
But for all the "power-searchers" out there the advantages of such a step would quickly become obvious. For instance wildcards or the entry of multiple search-terms would then become possible in a more intuitive (in particular for average users) substring-search as well without instantly forcing a more formalistic glob-search by the mere entry of any of those.
Should there be concerns about changing a longstanding (mis-?)-behavior in TC then splitting the search-mask into separate lines for substring and glob could be made available as an advanced option only via configuration or via a simple checkbox.
But for all the "power-searchers" out there the advantages of such a step would quickly become obvious. For instance wildcards or the entry of multiple search-terms would then become possible in a more intuitive (in particular for average users) substring-search as well without instantly forcing a more formalistic glob-search by the mere entry of any of those.