Quick search (ignoring accent in filename)

English support forum

Moderators: Hacker, petermad, Stefan2, white

pulbitz
Junior Member
Junior Member
Posts: 52
Joined: 2009-06-05, 12:19 UTC

Quick search (ignoring accent in filename)

Post by *pulbitz »

When using the simple search, any latin characters are matched regardless of their accents, e.g. “foo” will match “foo” as well as “föö” or “fóo”.


Please add this feature.
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

This is possible to add with a custom "tcmatch.dll" like the one I wrote: QuickSearch eXtended (I dont think Christian will implement it in TC itself. For this and other cases he made QuickSearch Plugin support.)

Perhaps I will add it. Does anyone has an idea how to determine which chars match each other and which not? Is there a function to determine the base character of an accented character without creating huge tables?

Edit: Until I or someone else implements something you may try the similarity search of my plugin. It allows to search for similar words by using the levenshtein algorithm.
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Samuel, let's user create such tables :) so, you add some ini-file support where users add chars they want to be equal. E.g.,

Code: Select all

[Similar]
o=o
a=a
e=eё
So when user will need new character, he will add it to this table.
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

I also came to this solution. But one line will be simpler:

Code: Select all

Similar=äáàâa|öóòôo|üúùûu
Edit: I want to give a good preset to this String. Could we gather all accented chars in this thread?
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

I think that it will be easier to open CharMap and to copy all Unicode characters from it... But there too much characters with diacritics.
User avatar
ado
Senior Member
Senior Member
Posts: 445
Joined: 2003-02-18, 13:22 UTC
Location: Slovakia, Pezinok

Post by *ado »

actually it is not - do not forget, that majority of full unicode are "not latin based" characters. There is just 26+26 ascii characters, if I'll overshoot, I do not expect more than 10 chars with diacritics per one basic. Also you do not want to have this functionality for Japanese or Chinese... char sets. So whole table can be let's say up to 500 char big

ado
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

I was so crazy and created this:

Code: Select all

MIME-Version: 1.0
Content-Type: application/octet-stream; name="Diacritics.7z"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="Diacritics.7z"

N3q8ryccAAPrdG27QAcAAAAAAABeAAAAAAAAAAa7qqUAf7+D3wBFr6RkLbWQ0ur1hvXp7cn8st3x
noojWOHU8KLNovRFKgWXyUl9aP23n3OCjqKoBxQvUdgffSq0AOHO23ZfgbcGzqsHaKPOAlYc8IMG
D/9aqJfiAlYwQn6aeQcjIyAAWyBeHI54esPKJ8/ZaCOVjPXOYzGH2yQDBJeDiX9OhL9hODDWs1z5
48jVTFvnctD5ShDH3AfmJocscI2BM3HnhkqQ5jFilCHoEeEXUDcSv8uoNoq4iL+wBphpAYh4BOIa
Z8m7HTtGIdexOWHRJcXwOzCWYe42zU/tebcZi2Oe3CKNqXixUmXLMjbz0jVaCicbLkyo0bcGCLuU
UKZ+j7lTDiFWRA7V0svy+sQAUt9TECxVOHKMGKG+Ty08HbIWxQEis/TdP3qEIm1P1XO0WqebM6Es
hQ1tOZLe3tuRSCfP2WjPeGCOAGOZvvGnO/DdmZLhPVRGVzd8rU4BgUL0KHiTOl9nLqxOJW2cko2D
08FJ9dynukopRA0tJnpBEx8XX62k13tMjVtAyEZx80Wt3/3eqHexxcyvmSzVkSzxSoJPUkYyiDHi
wC32z8ufvAOGe/vbrmYy9Il98U3YRYiG6Goddhx0gU5luIVw3LSuo9+/JzT3TxFRb8qCc42t2D1m
9vIK8MoAlUIFhxLTgDiZeON5iXpR05CGaWe4IR44wlCS+KnlCJwE3xf0auHOCf9lCUBGEI7I+j85
TRgIUDt6anrJimtZKYRez2xpc0ztDzznpEt7ZDufylOWLwRouHJ4q8iru39aS1Aow1oxuEnUdrVD
Uu84vAEI2/MX50X8yaotZf2siDMTyJghiuDA/StUJNVCblry861sq1fFErDtl16U9BH1y8GouDrP
W+QOpILNeFEUiZhLcvDoRQpcI+4lC1uuZGk6x1mTNfXUAcIuNrrs72JxwVmAD5hHtQcv5AesBO80
OY6gq+xuQpWmZWTwgUErC6xcLxHOTuKQ9W891TvuFcR3p+zoWq0i3ziOUzKwQ8ebrXc+3a14OSEq
4oa3mjB/rLnJOt44cqEUFVgUOg9evO8KhcxqIhOfzuT7aGu7hRfpcq1KbndrKkpXMO0fOXHEqGrj
ApxWBxmCZbkQru2Sww9iu52YbXBwGU9aSsLV4KPH5XOahBAtJWYWT10WZmQTHuwiZWWLjpklbi7E
DgCruL0ibkJKnrTSPGD43+IqQ80wAQp91Rxw8ZSLCIg+qIIpztWqwkiMgTheLC0ig+dkM9nkBVMk
jot4BxuZvnLMpgIb5KMLXk9C5dgkcC5bttvwV7Mi+RzAXwzdnTwESUc4GxOJ7fytb0s5AfCcyYP7
Q3Ft45ubWQUflq+hJm5SeSNAUq1SSmoeIRKeqGsKD5sBTDNmV/KDFQO/EoxU3UAcbr59xg0W9pka
ziDgaJuE1p6iv444XquDXuKh5OsZfPYVYQhPdKgSNe+9LHxyDT45J+SJJ2Rf4PK04gl7aUPWAAb6
aETDr9WLv8Ydi+ERezeMm95UKP8nGlu3M7VQqnqwbrqIhm1MVJfuWYyO5SAzmTCgu0PU7OSXAQrD
k2/8TwkexqKziNH2896l4JkfD0eFIPKABd3SB+3d+hyS/WcqOxQfmzcWaJto0gz/TOxdcdURkRri
Mpai9xif2S3V/1AOvFWuIlP9c3a3E8yTirU79ey6ICZbz+PkWJZuSADstrqhZ+iRB+XFey9zAw3i
Z6wCDRYFnghh3DHkYKd6KRkcUrJMqIrbOJGLM9IHY07Pd7mQeNTsy+RWEq6bBXgKPVA2KSWPkjAW
CsXwQedCDl1WEz757vGNms+R0DdnhKcXcNmYrk0J3cgr4K81wudsDy+yGaK/DwXuVrV8WXJHZvwK
rkqVmM1Mvby7kB9dYxqBCHGgPqs0cQceGneaFOaKRjlAeARVDSuQ7aRsE1sO1AmK8RrVDL6dtlDa
5hEP/N7FMNTRu1TqM+Ew4ce24w0iHq53chP4S1Aq/Z2jxm1WxsGLGB4YizYcLdq/UFidw+11Kk0X
zRTxgv1VQrnhVEj6o6+LLLWSrgeAIGohWc6NZrjk6Gw68QW1w6am7CvoBgLbSDIHZNPEmKIyECFr
1OvXNY4sdqIJ4D30dkCpTHRrChWRJ4aG3JIv5f0dE5HQn+ln1QV0PseeOSmIsVr4+ZnLuLJYvHIV
1N2d8eLXPkIOhwzf7368pGgxMEE559OEoEDtqd1o4IT0Vg0kotT8nKqFBctUD6kq109vO5tWVtfs
+LvkrvMoGHmbsXE6hD89Ee41tC6GhsUfEZ+IPtU4Jg1F9mOvyoXvwN7Tqp/jCAvfSmi8q18/OFgg
Pz08V3Nmvvu1V0jVEKkQ74OZkUqXId5VvES0GV5uJNMzQ9xDww3KsNRiDKQmIqVDTO969GM5EJo7
5TS6aQjODpmMbTAWLjUxRIHIOqkgQf9uzNmmYa37p/JteJNCru1Qj7QChQqwaQaaH1ItNy/3ja6a
HsrzPkuGAAEEBgABCYdAAAcLAQABIwMBAQVdAAABAAyMfAAICgEvlB1HAAAFAREfAEQAaQBhAGMA
cgBpAHQAaQBjAHMALgB0AHgAdAAAABQKAQBxLIvxZknKARUGAQAgKAAAAAA=

He-he... :shock:
Maybe I made a mistake somewhere but as starting point I think this will be great.
Maybe here a lot of extra characters, but if char name contains latin letter with some addition (e.g. Latin Capital Letter E With Circumflex And Hook Above :roll:), I added it.
User avatar
petermad
Power Member
Power Member
Posts: 16034
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Post by *petermad »

2MVV
I think you should remove the letters Æ, æ, Ø, ø, Å and å from your lists.

When used in Danish/Norwegian and partly Swedish they they are not diacritics or ligatures. They are all distinct letters (vowels).

In case of not having access to any extended character set they can be written as:
Æ = Ae
Ø = Oe
Å = Aa
æ = ae
ø = oe
å = aa

Under Danish/Norwegian locale they are sorted as the last 3 characters in the alphabet - not as a and o.

In Danish/Norwegian Aa and aa are considered to be one letter (may depend on the context) and are sorted as Å and å - as the last character.

In Swedish Å and å are sorted as the third last character. In Swedish Ä, ä, Ö and ö are sorted as the last 2 characters - not as a and o.

http://en.wikipedia.org/wiki/Danish_and_Norwegian_alphabet
http://en.wikipedia.org/wiki/%C3%86
http://en.wikipedia.org/wiki/%C3%98
http://en.wikipedia.org/wiki/%C3%85

http://en.wikipedia.org/wiki/Typographic_ligature#Digraphs
http://en.wikipedia.org/wiki/Diacritic
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

I added such letters to let user to enter just a or e if he wants to find ae letter e.g. :)
The sorting trouble is not our problem. We need to find non-standard characters when user types standard letter into search field, so some extra characters won't be bad. This is the reason why I added such combinations (also I added "Small Capital" letters to both small and capital).
Anyway, my list have no license, so you may do with it anything you want. :wink:
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

Thanks for your work MVV and your informations petermad. I decided not to add a diacritics table. It seems that there can be no standard for it.

The new Version of QuickSearch eXtended now supports to replace chars by strings. So its possible to replace diacritics like that:

Code: Select all

[replace]
chars1=ÀÁÂÃÄÅĀĂĄǍǞǠǺȀȂȦȺӐӒᴀᴬḀẠẢẤẦẨẪẬẮẰẲẴẶ|A
chars2=àáâãäåāăąǎǟǡǻȁȃȧɐӑӓᵃᵄᶏḁẚạảấầẩẫậắằẳẵặₐⱥ|a
chars3=ƁƂɃʙᴃᴮᴯḂḄḆ|B
chars4=ƀƃɓᵇᵬᶀḃḅḇ|b
chars5=æǣǽӕᴂᵆ|ae
Hope it suits your needs pulbitz.
pulbitz
Junior Member
Junior Member
Posts: 52
Joined: 2009-06-05, 12:19 UTC

Post by *pulbitz »

Perfect.
Thank you very much, I really appreciate it.
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

Your welcome.
User avatar
leopoldus
Senior Member
Senior Member
Posts: 221
Joined: 2004-11-21, 09:47 UTC

Post by *leopoldus »

Samuel
I use your file plugin to search filenames such way, ignoring accents and other diacritics (that it enter "Koln" in order to find the both "Koln" and "Köln").

But is there any option to use the inverse substitution, that is when I enter the search request "Köln" (with umlaut) I'll get in response the both versions "Köln" and "Koln"?

Thanks in advance!
Last edited by leopoldus on 2012-08-17, 05:37 UTC, edited 1 time in total.
User avatar
Samuel
Power Member
Power Member
Posts: 1930
Joined: 2003-08-29, 15:44 UTC
Location: Germany, Brandenburg an der Havel
Contact:

Post by *Samuel »

leopoldus wrote:Samuel
But is there any option to use the inverse substitution, that is when I enter the search request "Köln" (with umlaut) I'll get in response the both versions "Köln" and "Koln"?
Its already possible. I replace the chars in the filename and in the search string.

Code: Select all

[replace]
chars1=ö|o
chars2=ä|a
chars3=ü|u
chars4=ß|ss
User avatar
leopoldus
Senior Member
Senior Member
Posts: 221
Joined: 2004-11-21, 09:47 UTC

Post by *leopoldus »

Samuel wrote:
quote]But is there any option to use the inverse substitution, that is when I enter the search request "Köln" (with umlaut) I'll get in response the both versions "Köln" and "Koln"?
Its already possible. I replace the chars in the filename and in the search string.
Sorry, thus I missed something. However any substitution does not work for me my default. Suppose, that I have to install some chars substitution tables or to enable this feature in the plugin settings or something else?
Post Reply