xPDFSearch 1.11 - Content plugin to search text in PDF files
Moderators: Hacker, petermad, Stefan2, white
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
As announced xPDFSearch is now a Github project. The idea is to improve source code management and collaboration. If you want to contribute you have to commit to your own remote feature branch and make a pull request.
https://github.com/lefteous-tc/xPDFSearch
https://github.com/lefteous-tc/xPDFSearch
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
Dear Leftous,
today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?
ys
HHK
today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?
ys
HHK
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
2hhk
You should test another plugin: http://totalcmd.net/plugring/pdfOCR.html
You should test another plugin: http://totalcmd.net/plugring/pdfOCR.html
Andrzej P. Wozniak
Polish subforum moderator
Polish subforum moderator
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
What you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.hhk wrote: 2020-01-21, 16:50 UTC Dear Leftous,
today i installed your Plugin for a very tricky task:
there are ten-thousands of scanned PDFs, many of them contain text, some of them don´t. This depends on the various scanners they used over the years.
I have now to filter the non-text-PDFs to OCR them. Can i do this with your plugin?
ys
HHK
In Search box, search for pdf files and in plugin tab add
Code: Select all
xpdfsearch text !regexp .{10,}
Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.
If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....
The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
It's complete misunderstanding. I mean WDX, content plugin. Read the linked webpage, please:nsp wrote: 2020-01-22, 07:09 UTC The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format
See also the linked image: http://wincmd.ru/files/9924358/prezentacija_mala.jpgpdfOCD 0.9 wdx wrote: • Purpose:
pdfOCR is wdx plugin that discovers how many pages of PDF file in current directory needs character recognition (OCR), i.e. how many pages in PDF file have no searchable text in their layout.
(...)
• Possible usage:
- discover pdf documents which need to be OCR-ed for the first time
- discover PDF documents which are password protected and consequently not available for OCR processing
- discover PDF documents that was not properly OCR processed because of low resolution or similar causes
- discover PDF documents not properly formatted.
Andrzej P. Wozniak
Polish subforum moderator
Polish subforum moderator
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).
Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.
Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.
Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.
Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.
Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
Sorry, Big issue is my fault. I didn't RTFM. But the small issue remains.burstx wrote: 2020-02-06, 09:51 UTC I've just installed the latest TC 9.50 (x64) and tried to install xPDFSearch plugin downloaded from the official TC plugins page (the actual link to the plugin file).
Big issue
This is a sample PDF containing the text "PXContext", which is not found with the xPDFSearch plugin in use.
Small issue(perhaps this is a reason for the "big issue" described above)
If I open the plugin's .zip file inside the TC (i.e. in the files panel), the TC offers to install the plugin and the plugin is installed.
If I register the plugin via TC's "Configuration => Options..." menu, "Plugins=>Content Plugins (.WDX)" section, the error is shown:
Image: https://i.imgur.com/OcqUExd.png, although the plugin is claimed to be x32+x64-compatible.
Could you please check if there is a problem with the plugin or I configured/used it incorrectly?
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
Which file did you pick in the Content Plugins (.WDX)" section?
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
-
- Junior Member
- Posts: 2
- Joined: 2020-04-08, 14:40 UTC
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
nsp wrote: 2020-01-22, 07:09 UTCWhat you can do with xpdfsearch is to find the one that have almost no text (less than 10 characters in the following sample) and then have a list to send to your ocr software.hhk wrote: 2020-01-21, 16:50 UTC Dear Leftous,
I can find no information on how to inatall this plugin, I have unzipped it to a secondary folder still no info. there is a batch file and an exe file with no info either, Plese explain install process and how to work the program. Thanks it looks great.
In Search box, search for pdf files and in plugin tab addif you know which producer / application created the image only pdf, you can also search for it using dedicated properties . (PDF Producer / Application )Code: Select all
xpdfsearch text !regexp .{10,}
Once you get the file to process by OCR, you can feed to listbox. From listbox, you can also save the list to a dedicated folder of virtual-panel or in a file. Once done, you can process all files one by one using a button/user command that call your OCR engine or all at once using TCBL.
If your OCR process need times and/or manual validation, one by one process is the best choice for you. virtual-panel can help you to track non processed files ....
The PdfOCR wcx if fine to extract text only from image but will not help to rebuild a quality pdf with schema text indentation, format ...I personally use it to extract dedicated information from pdf which does not support cut/paste
-
- Junior Member
- Posts: 2
- Joined: 2020-04-08, 14:40 UTC
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
What also is "WDX"?
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
It is Content plugins for TC.
TC supports four types of plugins:
Packer plugins (WCX)
File System plugins (WFX)
Lister plugins (WLX)
Content plugins (WDX)
Help wrote: Configuration - Plugins
Change settings for all supported plugin types.
Download new plugins from ghisler.com
Connects to the page where you can download plugins which were tested by us.
Packer plugins Allows you to configure packer plugins. Usage: Files - Pack.
File system plugins Allows you to configure file system plugins. They allow to access file systems or similar devices or systems, e.g. a PocketPC, a Linux partition, or a remote server. File system plugins are used via the Network Neighborhood.
Lister plugins Allows you to configure Lister plugins. Usage: F3 on a supported file.
Content plugins Allows you to configure content plugins. Usage: Show - custom columns, multi-rename tool, search function.
FS-Plugins Allows you the installation of file system plugins. You can find them on www.ghisler.com in the addons section.
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
I can not use xPDFSearch for finding greek words within pdf documents. Is there a solution?
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
Dear Sir/Madam
I am in the process of conducting researches from a book which I have downloaded in both word and pdf formats. My researches require me to extract the contents from the book to include 2 groups or strings of words from some relevant text.
The following are 2 examples:
Example 1
First group (string): not only
Second group (string): but also
Relevant text 1:
He is not only intelligent but also funny.
Relevant text 2:
Mr X is not only an actor but also a philanthropist.
Example 2
First group: scarcely
Second group: when
Relevant text 1:
I had scarcely walked in the door when I got an urgent call and had to run right back out again.
Relevant text 2:
Scarcely had the teacher seen the student when he started studying.
My question is, how would I be able to extract the relevant text of the desired strings of words which are normally consisted of 2 groupings as demonstrated in the above 2 examples. Preferably, I would like to receive instructions on how to do so from both a word document and a pdf document.
I would like to thank you in advance.
Regards
Preston Chow
I am in the process of conducting researches from a book which I have downloaded in both word and pdf formats. My researches require me to extract the contents from the book to include 2 groups or strings of words from some relevant text.
The following are 2 examples:
Example 1
First group (string): not only
Second group (string): but also
Relevant text 1:
He is not only intelligent but also funny.
Relevant text 2:
Mr X is not only an actor but also a philanthropist.
Example 2
First group: scarcely
Second group: when
Relevant text 1:
I had scarcely walked in the door when I got an urgent call and had to run right back out again.
Relevant text 2:
Scarcely had the teacher seen the student when he started studying.
My question is, how would I be able to extract the relevant text of the desired strings of words which are normally consisted of 2 groupings as demonstrated in the above 2 examples. Preferably, I would like to receive instructions on how to do so from both a word document and a pdf document.
I would like to thank you in advance.
Regards
Preston Chow
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
Moderator message from: white » 2023-01-10, 23:38 UTC
Done.Lefteous wrote: 2023-01-10, 22:45 UTC I would propose to split this thread (by moderators) at the point where you forked the plugin.
The thread about zeeko's fork is here: xPDFSearch 1.38 - Content plugin to search text in PDF files
Re: xPDFSearch 1.11 - Content plugin to search text in PDF files
2white
Thank you!
Thank you!