xPDFSearch 1.11 - Content plugin to search text in PDF files

Lefteous · Post by *Lefteous » 2014-09-26, 10:36 UTC

2ghisler(Author)
Ft_fulltextw should use the same encoding as other Unicode functions in TC plugins system.

Ovg · Post by *Ovg » 2014-09-26, 11:17 UTC

2iana

iana wrote: You should set regional settings to your language that might be the issue.

Already done about three years ago

iana · Post by *iana » 2014-09-26, 12:14 UTC

Already done about three years ago Wink Laughing

What doesn't work?
I tested a simple pdf and all the functions I tested work, here are a few screenshots
http://i.imgur.com/R2RV3kp.png
http://i.imgur.com/kNfRFbz.png

both custom fields and custom search support Cyrillic. Unicode is not needed for Cyrillic, although the default font used needs to have Cyr glyphs, I'm using 7 but as I upgraded from XP my default font for most of TC is Tahoma and it has Cyrillic glyphs, it might be an issue with your font.

edit.
I just had a thought, if the pdf was generated with some old transliterated fonts (7bit fonts that had only cyr characters) then there is no way for xpdf to display them properly that's a font limitation, I remember someone sending my a pdf with such fonts that weren't embedded, the pdf looked strange, the only way around that is to recreate the pdf with proper 8-bit ascii or unicode fonts.

milo1012 · Post by *milo1012 » 2014-09-26, 14:43 UTC

iana wrote:Unicode is not needed for Cyrillic...

Quite a vague statement, isn't it?
Unicode is required! If your system isn't set to Cyrillic you can't search for these characters,
especially if you're on some remote Workstation where you're not allowed to switch system settings.
I have tons of technical documents where information/text is stored in CJK characters.
There is no way I can search them with xPDFSearch the way it is now.
IDK what's the special case with Ovg's system,
but independent from that, it is time to finally have a Unicode variant for ft_fulltext.

iana wrote:I just had a thought, if the pdf was generated with some old transliterated fonts (7bit fonts that had only cyr characters) then there is no way for xpdf to display them properly that's a font limitation

This has nothing to do with PDF-embedded or system font.
The PDF is simply decoded with Xpdf - no fonts required because it's not displayed or rendered -
and the decoded data is transferred to TC (in portions), and is being searched.

iana · Post by *iana » 2014-09-26, 15:11 UTC

I am for full unicode support, but that does not change the fact that unicode is not needed for Cyrillic support, there are around 2*35 (70+) Cyrillic characters, most 8-bit fonts with 256 symbols do include not just Latin but Greek and Cyrillic support, I was replaying to Ovg as he said xpdf didn't work with Cyrillic it does, those old transliterated fonts wore popular in the early 90's (they replace the latin with cyr symbols) most people have stopped using them but there are old documents that are in general badly generated pdf's, xpdf will and has displayed those chars as latin even if you have the fonts installed or embedded.

This has nothing to do with PDF-embedded or system font.
The PDF is simply decoded with Xpdf - no fonts required because it's not displayed or rendered -
and the decoded data is transferred to TC (in portions), and is being searched.

but if you have that font set as the default tc font, xpdf will display it correctly that's why I said it's a font issue, for example this is a popular font in my country (it has no latin glyphs, the cyr are encoded with a lower id # corresponding to it's latin cousin)
http://www.fonts2u.com/mac-c-times.font
a lot of documents wore generated using it, there is no way tc via xpdf would display the content of that document properly, you would need to set that font as the default tc font or regenerate the pdf. As the font used for metadata can not be changed most of the information content for those old documents would be Latin

milo1012 · Post by *milo1012 » 2014-09-26, 15:41 UTC

iana wrote:but that does not change the fact that unicode is not needed for Cyrillic support...

Again, this is just wrong, or at least vague, depending on your system.

iana wrote:there are around 2*35 (70+) Cyrillic characters, most 8-bit fonts with 256 symbols do include not just Latin but Greek and Cyrillic support

What link is there between fonts and character recoding?
We're talking about a text search here, no display at all for that purpose.
You're entering a character in TCs text box for xPDFSearch. If you enter Cyrillic characters, TC recodes them to the system's ANSI page.
My system page is 1252, so there are no Cyrillic characters there, and I'll get a replacement character (question mark).
Now, when TC searches the raw characters that are streamed from xPDFSearch, it's just absolutely unlikely that you'll get a match that way.

iana wrote:that's why I said it's a font issue

Xpdf, just like most other programs, rely on the system ANSI page.
So it's no Font issue, but a system-setting issue.
If this page can't map the TC input to the xPDFSearch output, there is no match, just like I said above (1252 et. al. has no Cyrillic).
It just doesn't matter if xpdf correctly recodes the characters, we can't match them the way it is now.

Ovg · Post by *Ovg » 2014-09-26, 15:54 UTC

2iana

iana wrote: What doesn't work?

try find Text or Document Start, not author or title
seek for author and title works for me too

iana · Post by *iana » 2014-09-26, 18:21 UTC

yes sorry, Cyrillic content can't be shown.

iana · Post by *iana » 2014-09-27, 22:11 UTC

Is a mupdf version/fork possible?
the sumatrapdf guys have a command line app (available if you build it your self) that uses mupdf and dumps not only pdf info but epub/mobi/cbz/cbr... info too

https://code.google.com/p/sumatrapdf/source/browse/trunk/src/EngineDump.cpp

http://mupdf.com/docs/overview

I believe mupdf is more actively developed then xpdf, and mupdf has/is a native win library.

Lefteous · Post by *Lefteous » 2014-09-29, 06:38 UTC

A plugin based on other code would be a new plugin. Anyway I can take a look at it.

Iowa · Post by *Iowa » 2014-11-05, 08:23 UTC

Hello!
I have a problem with the installation of the xpdf-plugin. I get the info that the plugin needs a dll that is not installed. I run windows7 64 bit. totalcommander 8.51a. Under winxp it works fine!
What can i do?

Lefteous · Post by *Lefteous » 2014-11-05, 10:07 UTC

2Iowa
Could you please post the exact error message which hopefully displays the name of the missing library?

Iowa · Post by *Iowa » 2014-11-05, 11:53 UTC

Sorry, there is no other information than that:

Fehler beim laden der Plugindatei.
Das Plugin benötigt wahrscheinlich dlls, welche auf Ihrem System fehlen.

Horst.Epp · Post by *Horst.Epp » 2014-11-05, 13:12 UTC

Iowa wrote:Sorry, there is no other information than that:

Fehler beim laden der Plugindatei.
Das Plugin benötigt wahrscheinlich dlls, welche auf Ihrem System fehlen.

Das Fileinfo Plugin zeigt einem die DLL Dependency an.
Dabei wird auch angezeigt welche fehlen.

Iowa · Post by *Iowa » 2014-11-13, 07:48 UTC

Moin,

Fileinfo zeigt ein gelbes Ausrufezeichen ohne Sanduhr bei der Kernell32.dll.
Wenn ich dann den Baum weiter aufklappe stehen da keine weiteren kritischen Abhängigkeiten.

[It seems to be the kernell32.dll. The fileinfoplugin gives an exclamation mark without hourglas. Other dependences are not listed.]

Gruß Iowa