xPDFSearch 1.11 - Content plugin to search text in PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2ghisler(Author)
Ft_fulltextw should use the same encoding as other Unicode functions in TC plugins system.
User avatar
Ovg
Power Member
Power Member
Posts: 756
Joined: 2014-01-06, 16:26 UTC

Post by *Ovg »

2iana
iana wrote: You should set regional settings to your language that might be the issue.
Already done about three years ago :wink: :lol:
It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 11.01 x64, Windows 7 SP1 x64
iana
Senior Member
Senior Member
Posts: 345
Joined: 2010-07-27, 22:00 UTC

Post by *iana »

Already done about three years ago Wink Laughing
What doesn't work?
I tested a simple pdf and all the functions I tested work, here are a few screenshots
http://i.imgur.com/R2RV3kp.png
http://i.imgur.com/kNfRFbz.png

both custom fields and custom search support Cyrillic. Unicode is not needed for Cyrillic, although the default font used needs to have Cyr glyphs, I'm using 7 but as I upgraded from XP my default font for most of TC is Tahoma and it has Cyrillic glyphs, it might be an issue with your font.

edit.
I just had a thought, if the pdf was generated with some old transliterated fonts (7bit fonts that had only cyr characters) then there is no way for xpdf to display them properly that's a font limitation, I remember someone sending my a pdf with such fonts that weren't embedded, the pdf looked strange, the only way around that is to recreate the pdf with proper 8-bit ascii or unicode fonts.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

iana wrote:Unicode is not needed for Cyrillic...
Quite a vague statement, isn't it?
Unicode is required! If your system isn't set to Cyrillic you can't search for these characters,
especially if you're on some remote Workstation where you're not allowed to switch system settings.
I have tons of technical documents where information/text is stored in CJK characters.
There is no way I can search them with xPDFSearch the way it is now.
IDK what's the special case with Ovg's system,
but independent from that, it is time to finally have a Unicode variant for ft_fulltext.
iana wrote:I just had a thought, if the pdf was generated with some old transliterated fonts (7bit fonts that had only cyr characters) then there is no way for xpdf to display them properly that's a font limitation
This has nothing to do with PDF-embedded or system font.
The PDF is simply decoded with Xpdf - no fonts required because it's not displayed or rendered -
and the decoded data is transferred to TC (in portions), and is being searched.
TC plugins: PCREsearch and RegXtract
iana
Senior Member
Senior Member
Posts: 345
Joined: 2010-07-27, 22:00 UTC

Post by *iana »

I am for full unicode support, but that does not change the fact that unicode is not needed for Cyrillic support, there are around 2*35 (70+) Cyrillic characters, most 8-bit fonts with 256 symbols do include not just Latin but Greek and Cyrillic support, I was replaying to Ovg as he said xpdf didn't work with Cyrillic it does, those old transliterated fonts wore popular in the early 90's (they replace the latin with cyr symbols) most people have stopped using them but there are old documents that are in general badly generated pdf's, xpdf will and has displayed those chars as latin even if you have the fonts installed or embedded.
This has nothing to do with PDF-embedded or system font.
The PDF is simply decoded with Xpdf - no fonts required because it's not displayed or rendered -
and the decoded data is transferred to TC (in portions), and is being searched.
but if you have that font set as the default tc font, xpdf will display it correctly that's why I said it's a font issue, for example this is a popular font in my country (it has no latin glyphs, the cyr are encoded with a lower id # corresponding to it's latin cousin)
http://www.fonts2u.com/mac-c-times.font
a lot of documents wore generated using it, there is no way tc via xpdf would display the content of that document properly, you would need to set that font as the default tc font or regenerate the pdf. As the font used for metadata can not be changed most of the information content for those old documents would be Latin
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

iana wrote:but that does not change the fact that unicode is not needed for Cyrillic support...
Again, this is just wrong, or at least vague, depending on your system.
iana wrote:there are around 2*35 (70+) Cyrillic characters, most 8-bit fonts with 256 symbols do include not just Latin but Greek and Cyrillic support
What link is there between fonts and character recoding?
We're talking about a text search here, no display at all for that purpose.
You're entering a character in TCs text box for xPDFSearch. If you enter Cyrillic characters, TC recodes them to the system's ANSI page.
My system page is 1252, so there are no Cyrillic characters there, and I'll get a replacement character (question mark).
Now, when TC searches the raw characters that are streamed from xPDFSearch, it's just absolutely unlikely that you'll get a match that way.
iana wrote:that's why I said it's a font issue
Xpdf, just like most other programs, rely on the system ANSI page.
So it's no Font issue, but a system-setting issue.
If this page can't map the TC input to the xPDFSearch output, there is no match, just like I said above (1252 et. al. has no Cyrillic).
It just doesn't matter if xpdf correctly recodes the characters, we can't match them the way it is now.
TC plugins: PCREsearch and RegXtract
User avatar
Ovg
Power Member
Power Member
Posts: 756
Joined: 2014-01-06, 16:26 UTC

Post by *Ovg »

2iana
iana wrote: What doesn't work?
try find Text or Document Start, not author or title
seek for author and title works for me too
It's impossible to lead us astray for we don't care even to choose the way.
#259941, TC 11.01 x64, Windows 7 SP1 x64
iana
Senior Member
Senior Member
Posts: 345
Joined: 2010-07-27, 22:00 UTC

Post by *iana »

yes sorry, Cyrillic content can't be shown.
iana
Senior Member
Senior Member
Posts: 345
Joined: 2010-07-27, 22:00 UTC

Post by *iana »

Is a mupdf version/fork possible?
the sumatrapdf guys have a command line app (available if you build it your self) that uses mupdf and dumps not only pdf info but epub/mobi/cbz/cbr... info too

https://code.google.com/p/sumatrapdf/source/browse/trunk/src/EngineDump.cpp

http://mupdf.com/docs/overview

I believe mupdf is more actively developed then xpdf, and mupdf has/is a native win library.
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

A plugin based on other code would be a new plugin. Anyway I can take a look at it.
Iowa
Junior Member
Junior Member
Posts: 18
Joined: 2007-06-29, 08:09 UTC

Post by *Iowa »

Hello!
I have a problem with the installation of the xpdf-plugin. I get the info that the plugin needs a dll that is not installed. I run windows7 64 bit. totalcommander 8.51a. Under winxp it works fine!
What can i do?
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2Iowa
Could you please post the exact error message which hopefully displays the name of the missing library?
Iowa
Junior Member
Junior Member
Posts: 18
Joined: 2007-06-29, 08:09 UTC

Post by *Iowa »

Sorry, there is no other information than that:

Fehler beim laden der Plugindatei.
Das Plugin benötigt wahrscheinlich dlls, welche auf Ihrem System fehlen.
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6450
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Post by *Horst.Epp »

Iowa wrote:Sorry, there is no other information than that:

Fehler beim laden der Plugindatei.
Das Plugin benötigt wahrscheinlich dlls, welche auf Ihrem System fehlen.
Das Fileinfo Plugin zeigt einem die DLL Dependency an.
Dabei wird auch angezeigt welche fehlen.
Iowa
Junior Member
Junior Member
Posts: 18
Joined: 2007-06-29, 08:09 UTC

Post by *Iowa »

Moin,

Fileinfo zeigt ein gelbes Ausrufezeichen ohne Sanduhr bei der Kernell32.dll.
Wenn ich dann den Baum weiter aufklappe stehen da keine weiteren kritischen Abhängigkeiten.

[It seems to be the kernell32.dll. The fileinfoplugin gives an exclamation mark without hourglas. Other dependences are not listed.]

Gruß Iowa
Post Reply