QuickSearch eXtended

Samuel · Post by *Samuel » 2011-10-09, 06:36 UTC

lonki wrote:The simple search function can work in 8.0b4 x64 with QuickSearch eXtended 2.1.1 except for Chinese pinyin, even if I set:
use_pinyin=1

Mh confirmed. I will try to find the reason.

Post by *ghisler(Author) » 2011-10-09, 13:47 UTC

2Samuel
Is this a problem with TC or with your dll?

Samuel · Post by *Samuel » 2011-10-09, 17:05 UTC

I couldn't figure that out yet.

As I use your code for PinYin, your own PinYin plugin could be affected too.

Samuel · Post by *Samuel » 2011-10-10, 09:37 UTC

Updated first post: Version 2.1.2 is available.

Code: Select all

Version 2.1.2
 - Fixed: Pinyin search was broken in x64 version.

The reason was that tcmatch looked for "tcmatch64.tbl" instead of "tcmatch.tbl".

ghisler(Author) wrote:2Samuel
Is this a problem with TC or with your dll?

My dll was the reason.

lonki wrote:If remove tcmatch.exe tcmatch.dll and keep tcmatch64.dll tcmatch64.exe, the icons in the bottom quick search bar cannot be displayed properly.

I saw that too late, perhaps I will make this possible in further updates. Currently tcmatch.dll is needed also in the x64 version.

lonki · Post by *lonki » 2011-10-10, 13:47 UTC

thanks for the quick update, it's perfect~

Samuel · Post by *Samuel » 2011-10-15, 21:43 UTC

I worked on a new version that should improve the experience with the Korean language:
QuickSearch eXtended Korean (x32) beta 1

Please install the regular version first and overwrite the "tcmatch.dll" with the packed file. You need "use_pinyin=1" in your "tcmatch.ini" to make it work.
(I plan to release the merged final [including x64 / readme and stuff] in around 2 weeks.)

sheppaul mentioned the following improvement:

sheppaul wrote:While doing quick search, TCMD does not recognize the initial characters of korean, which is consonant.
Therefore, the quick search will not be activated until one syllable is completed.

For example, see the following file tables.
When the consonant 'ㅂ' is input, a cursor bar should be placed upon the file, '법 정신.hwp', instantly.

Currently, '법' (one syllable) must be input to activate quick search.

I investigated about the Korean language mostly from this useful site.

In the above beta these 3 rules are implemented:
1) a lead consonant in the search string should match all combinations of this lead consonant with any vowels.
2) a lead consonant in the search string should match all combinations of this lead consonant with any vowels and trail consonants.
3) a lead consonant combined with a vowel in the search string should match all combinations with any trail consonants.

A handy feature would come with this:
You only would need to write the lead consonants of the syllables in the search string: (ㅍㅇ would match 평양) I don't know, but it could be faster.

I also considered to implement these 3 rules. (perhaps I will in beta 2)
4) a vowel in the search string should match all combinations of this vowel with any lead consonants.
5) a vowel in the search string should match all combinations of this vowel with any lead consonants and trail consonants.
6) a trail consonant in the search string should match all combinations with any lead consonants and vowels.

(7) According to my source there are different chars for lead consonants, vowels and trail consonants. (like Jamo ᄀ [U4352] and Compatibility Jamo ㄱ [U12593]) So far I only implemented the normal Jamo's. Anyone thinks the Compatibility Jamo's are needed too?

I need feedback about 4, 5, 6 and 7. (and if it works.

)

infimum · Post by *infimum » 2011-10-15, 22:29 UTC

One language is missing from your support of "East Asia," which also uses Chinese characters

Samuel · Post by *Samuel » 2011-10-15, 22:38 UTC

infimum wrote:One language is missing from your support of "East Asia," which also uses Chinese characters

I also thought about Japanese when implementing Korean. There seems to be something like PinYin too.

If there is someone from Japan out there that has a propose / wants to test something feel free to contact me. (I will not implement anything if there is noone using/testing it.)

infimum · Post by *infimum » 2011-10-15, 23:24 UTC

Samuel wrote:There seems to be something like PinYin too.

Migemo is the most widely used.

Click the "video" link in this Firefox add-on description to feel how it works.
https://addons.mozilla.org/en-US/firefox/addon/xulmigemo/

Here are some goods.
http://0xcc.net/migemo/#download
http://code.google.com/p/cmigemo/

I am a willing tester

Samuel · Post by *Samuel » 2011-10-16, 00:20 UTC

infimum wrote:
Samuel wrote:There seems to be something like PinYin too.
Migemo is the most widely used.

Click the "video" link in this Firefox add-on description to feel how it works.
https://addons.mozilla.org/en-US/firefox/addon/xulmigemo/

Here are some goods.
http://0xcc.net/migemo/#download
http://code.google.com/p/cmigemo/

I am a willing tester

The migemo project looks like something more powerful than just character conversion. (It looks like it contains RegEx, Similarity search and some kind of Dictionary stuff [similar things I already did here]) It looks very promising to implement migemo as standalone TC tcmatch.dll. (I don't know if I can do it because there is no good English documentation for it)

However I only need a conversion (preferable an algorithm or a table) from one or more basic English characters to normal Japanese characters. (or something like this) Then all the features of my search could be used.

Also a good English page about how migemo works would be nice.

infimum · Post by *infimum » 2011-10-16, 01:08 UTC

Samuel wrote:However I only need a conversion (preferable an algorithm or a table) from one or more basic English characters to normal Japanese characters. (or something like this) Then all the features of my search could be used.

Download the package from the Google project page above. The "dict" folder has what you want. The files in the folder are all text files. "roma2hira.dat" is used for a conversion from Roman letters to hiragana.

"hira2kana.dat" is from hiragana to katakana.

"han2zen.dat" is from half-width characters to full-width characters/numbers/symbols.

Samuel · Post by *Samuel » 2011-10-16, 12:45 UTC

infimum wrote:
Samuel wrote:However I only need a conversion (preferable an algorithm or a table) from one or more basic English characters to normal Japanese characters. (or something like this) Then all the features of my search could be used.
Download the package from the Google project page above. The "dict" folder has what you want. The files in the folder are all text files. "roma2hira.dat" is used for a conversion from Roman letters to hiragana.

"hira2kana.dat" is from hiragana to katakana.

"han2zen.dat" is from half-width characters to full-width characters/numbers/symbols.

I found the lists.

This will be hard: (to make it work with any search mode.)

Code: Select all

xya	ゃ
xyi	ぃ
xyu	ゅ
xye	ぇ
xyo	ょ

kya	きゃ
kyi	きぃ
kyu	きゅ
kye	きぇ
kyo	きょ

Currently only one English character is substituted with one foreign character.

So as far as I understood:
1) I substitute the English strings in the search text with the according hiragana characters from "roma2hira.dat". (like "kyu" with "きゅ")
2) I substitute the English strings in the search text with the according katakana characters from "hira2kata.dat" with help of "roma2hira.dat". (like "kyu" with "キュ")
2) I substitute the English strings in the search text with the according half-width kana characters from "han2zen.dat" with help of "roma2hira.dat" and "hira2kata.dat". (like "kyu" with "ｷｭ")

Could you confirm my step by step instructions?
Is there anything else needed?

infimum · Post by *infimum » 2011-10-16, 14:04 UTC

1) Correct.
2) Correct.
2(?)) Correct.

Samuel wrote:Is there anything else needed?

Yes. "migemo-dict" is the largest file. It lists "indirect" conversions. They are mostly from hiragana to Chinese characters. In your example, Chinese characters which can be rendered as "きゅ" in hiragana would be picked up when "kyu" is typed. (In reality, there are no such Chinese characters, but it's just an example.)

I said "mostly" because there are some conversions from Roman letters/numbers/symbols to Chinese/Japanese characters in "migemo-dict". For example, the German word "Volk" is rendered as "フォルク" in Japanese, as seen in the name of the famous car company

sheppaul · Post by *sheppaul » 2011-10-16, 14:45 UTC

I've tried the modified version of QuickSearch eXtended for Korean.

1. 2.1.2 version was installed and a modified tcmatch.dll was overwritten.
2. TCMD 7.56a is used for testing and PinYin option is enabled.
3. Ctrl-S is enabled before testing. The option of TCMD used for quick search: Letters - with search dialog.

Unfortunately, there seems to be no difference between original tcmatch.dll and the modified one.
It works but not instantly and the fast searching with the lead consonants of the syllables in the search strings (ㅍㅇ would match 평양) seems not work. I'm not sure but it can be the problem of character codes used for filename. That is, the character codes of "평양" could not be unicodes. If not, there should be a bug. (I'm using win7 x64 Korean Edition. A filename of win7 is created with unicode?)

When starting search with english characters, the result is filtered instantly. QuickSearch simply doesn't work with a lead consonant of Korean. ㅍ should match 평양, but ㅍ-input waits for any vowel.

First of all, ㅍ-input should open search dialog. There is no search dialog with a first keystroke: no instant action of search. A small IME box intercepts a first keystroke and waits for any vowel. (probably to complete syllable.)

Thank you for supporting.
z

Samuel · Post by *Samuel » 2011-10-16, 17:00 UTC

Japanese:

infimum wrote:2(?)) Correct.

I meant (3) of course.

infimum wrote:
Samuel wrote:Is there anything else needed?
Yes. "migemo-dict" is the largest file. It lists "indirect" conversions. They are mostly from hiragana to Chinese characters. In your example, Chinese characters which can be rendered as "きゅ" in hiragana would be picked up when "kyu" is typed. (In reality, there are no such Chinese characters, but it's just an example.)

I said "mostly" because there are some conversions from Roman letters/numbers/symbols to Chinese/Japanese characters in "migemo-dict". For example, the German word "Volk" is rendered as "フォルク" in Japanese, as seen in the name of the famous car company

I can't find "migemo-dict". Do you mean "zen2han.dat"? If not where do I find it? Could you give an example how to use the "migemo-dict"?

Korean:

sheppaul wrote:Unfortunately, there seems to be no difference between original tcmatch.dll and the modified one.
It works but not instantly and the fast searching with the lead consonants of the syllables in the search strings (ㅍㅇ would match 평양) seems not work. I'm not sure but it can be the problem of character codes used for filename. That is, the character codes of "평양" could not be unicodes. If not, there should be a bug. (I'm using win7 x64 Korean Edition. A filename of win7 is created with unicode?)

When starting search with english characters, the result is filtered instantly. QuickSearch simply doesn't work with a lead consonant of Korean. ㅍ should match 평양, but ㅍ-input waits for any vowel.

Looks like the compatibility jamos are needed. (see my previous post)
Could you try if this "ᄑᄋ" filters the file "평양" correct? (instead of "ㅍㅇ")

Could you also try if this "펴야" filters files like this "평양"?

sheppaul wrote:First of all, ㅍ-input should open search dialog. There is no search dialog with a first keystroke: no instant action of search. A small IME box intercepts a first keystroke and waits for any vowel. (probably to complete syllable.)

This is something my plugin cannot handle. TC calls it with search string and file name. I have nothing to do with opening the quicksearch window.
You could open it manually by pressing ctrl+s or open a bug report.