WDX plugin pdfOCR - Show details of PDF files
Moderators: white, Hacker, petermad, Stefan2
WDX plugin pdfOCR - Show details of PDF files
WDX plugin pdfOCR is intended to show the number of pages in a pdf file that need OCR processing. With the help of pdfOCR plugin, you can immediately spot which pdf files are unavailable for text search, either by you or by some indexing system. That is the purpose of needOCR column.
Next you have the Password column as well that will present "YES" if some of your pdf files are protected with password. Also your pdf can have some of rights restricted. In both cases the column Password will state "yes". That is good for people to know if some pdf files needs to be relieved from password before put to normal use. Also it is good to know if file is protected before you try to open pdf for OCR processing.
Finally the column Pages shows the total number of pages so you can compare the "needOCR" pages with total number and decide if it is worth of OCR processing.
Download: http://www.totalcmd.net/plugring/pdfOCR.html
Next you have the Password column as well that will present "YES" if some of your pdf files are protected with password. Also your pdf can have some of rights restricted. In both cases the column Password will state "yes". That is good for people to know if some pdf files needs to be relieved from password before put to normal use. Also it is good to know if file is protected before you try to open pdf for OCR processing.
Finally the column Pages shows the total number of pages so you can compare the "needOCR" pages with total number and decide if it is worth of OCR processing.
Download: http://www.totalcmd.net/plugring/pdfOCR.html
Well, also thx from me!
But to get things clear:
It just counts pages that don't have text in them?
So even shortest snippets, like one letter/word in a page, makes them count as non-OCR?
I just think it would have been better to call such fields sth. like "non-Picture" or "text-contained pages",
because not every page that doesn't contain text is supposed to need OCR (This page [is] intentionally left blank ...)
2nd question: what pdf engine is used? Seems like CPDF Command Line Tools to me (krk.exe), am I right?
Any chance to link it statically (cause the source is available)?
I also suggest to call your analyzing procedure in background (return ft_delayed), because it gets really slow when I use custom columns.
But to get things clear:
It just counts pages that don't have text in them?
So even shortest snippets, like one letter/word in a page, makes them count as non-OCR?
I just think it would have been better to call such fields sth. like "non-Picture" or "text-contained pages",
because not every page that doesn't contain text is supposed to need OCR (This page [is] intentionally left blank ...)
2nd question: what pdf engine is used? Seems like CPDF Command Line Tools to me (krk.exe), am I right?
Any chance to link it statically (cause the source is available)?
I also suggest to call your analyzing procedure in background (return ft_delayed), because it gets really slow when I use custom columns.
TC plugins: PCREsearch and RegXtract
Thank you, thank you, thank you!milo1012 wrote:Well, also thx from me! But to get things clear:
It just counts pages that don't have text in them?
So even shortest snippets, like one letter/word in a page, makes them count as non-OCR?
The program counts the number of pages with no font detected. So it would be maybe better called "no-font pages". I chose the name of columns towards users who usually don't care much what's behind the scene but more what functionality they get - that is to say if one needs to do OCR processing. Anyway, user can easily name any column as he prefers.
I have no secrets in front of you: yea I use cpdf temporarily for this beta version to fulfill my need of preparation of a large pdf library that I collected for years (more decades...) and I was a bit surprised how it was put to good use in my case. It is done now and I am glad.
Good idea.milo1012 wrote:Any chance to link it statically (cause the source is available)?
To be honest this is my very first plugin ever, and first C++ program after maybe 15 years; I still have to figure out not only the C++ but more how to work with that TC plugin thing . By the way the plugin can be accelerated considerably even without any of previous, but I have to find some time to make the next version. People have been downloading this plugin about 30 per day and yet previous reactions are the first to come. We shall see how many will be in the future.milo1012 wrote:I also suggest to call your analyzing procedure in background (return ft_delayed), because it gets really slow when I use custom columns.
Thanks for the suggestions, they are awesome!
Love the idea of the plug in
Unfortunately it is extremely slow, and caused TC to be unresponsive.
Re: Love the idea of the plug in
Really sorry for that, I know the problem. The plugin needs to be improved but I have no time for that. I recommend somebody does the effort of making the good improved plugin for similar purpose.xkxtnt wrote:Unfortunately it is extremely slow, and caused TC to be unresponsive.
Re: WDX plugin pdfOCR - Show details of PDF files
download link is dead. Any alternative download link?slavne wrote: ↑2014-12-10, 20:38 UTC WDX plugin pdfOCR is intended to show the number of pages in a pdf file that need OCR processing. With the help of pdfOCR plugin, you can immediately spot which pdf files are unavailable for text search, either by you or by some indexing system. That is the purpose of needOCR column.
Next you have the Password column as well that will present "YES" if some of your pdf files are protected with password. Also your pdf can have some of rights restricted. In both cases the column Password will state "yes". That is good for people to know if some pdf files needs to be relieved from password before put to normal use. Also it is good to know if file is protected before you try to open pdf for OCR processing.
Finally the column Pages shows the total number of pages so you can compare the "needOCR" pages with total number and decide if it is worth of OCR processing.
Download: http://www.totalcmd.net/plugring/pdfOCR.html
Re: WDX plugin pdfOCR - Show details of PDF files
2mgroen
Try again with HTTPS link: https://www.totalcmd.net/plugring/pdfOCR.html
If you still have problems, wait a day, restart your system to refresh DNS cache and try once again.
Try again with HTTPS link: https://www.totalcmd.net/plugring/pdfOCR.html
If you still have problems, wait a day, restart your system to refresh DNS cache and try once again.
Andrzej P. Wozniak
Polish subforum moderator
Polish subforum moderator
Re: WDX plugin pdfOCR - Show details of PDF files
2Usher
totalcmd.net currently points to the wrong IP addresses, on several important (if not all) DNS servers, Quad9, 1.1.1.1 and Google among them. No system reboot, access via HTTPS or DNS cache flush is going to help with this. The only options are to wait and/or to add the correct IP address to the hosts file as pointed out by Flint.
Regards
Dalai
totalcmd.net currently points to the wrong IP addresses, on several important (if not all) DNS servers, Quad9, 1.1.1.1 and Google among them. No system reboot, access via HTTPS or DNS cache flush is going to help with this. The only options are to wait and/or to add the correct IP address to the hosts file as pointed out by Flint.
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: WDX plugin pdfOCR - Show details of PDF files
If you can edit the hosts file, you can get back the access to totalcmd.net and wincmd.ru (where the plugins are actually hosted) in your web browser with an advice from the post https://ghisler.ch/board/viewtopic.php?p=397562#p397562 (do the record for the same IP in the hosts file for wincmd.ru too).
Or, temporarily (while totalcmd.ru and wincmd.ru domains are not accessible), you can use their "preview" domains: on totalcmd.net's preview domain, open the plugin page (for this plugin it will be http://xhmhk.hosts.cx/plugring/pdfOCR.html), then copy donwload link and change the "xhmhk" domain name part there to wincmd.ru's "preview" one, "ob9gr".
This way, for pdfOCR plugin the download link will be http://ob9gr.hosts.cx/download.php?id=pdfOCR; or, if you can get a copy of a direct link to a file, which is shown in a tooltip over "Download" link on the plugin page, then you can change that link the same way, so for this plugin it will be: http://ob9gr.hosts.cx/files/9924358/wdx_pdfOCR_0.9.rar.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
Ukraine's National Bank special bank account:
UA843000010000000047330992708
Re: WDX plugin pdfOCR - Show details of PDF files
I waited a couple of days, I downloaded the file wdx_pdfOCR_0.9.rar
But now??
I double clicked on the rar file, TC asked me to install the plugin, I did.
Then I restarted TC ,
and moved to a folder which contains pdf files,
but no columns are displayed like: "pages", "Need OCR" etc.
I use TC 9.51 64bit.
Any tips/info on how to proceed?
But now??
I double clicked on the rar file, TC asked me to install the plugin, I did.
Then I restarted TC ,
and moved to a folder which contains pdf files,
but no columns are displayed like: "pages", "Need OCR" etc.
I use TC 9.51 64bit.
Any tips/info on how to proceed?
Re: WDX plugin pdfOCR - Show details of PDF files
2mgroen
Add the custom columns you need: https://www.ghisler.ch/wiki/index.php?title=Custom_columns
They don't magically appear.
Regards
Dalai
Add the custom columns you need: https://www.ghisler.ch/wiki/index.php?title=Custom_columns
They don't magically appear.
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: WDX plugin pdfOCR - Show details of PDF files
how is it possible that totalcmd.net points to the wrong IP address?Dalai wrote: ↑2021-03-13, 21:49 UTC 2Usher
totalcmd.net currently points to the wrong IP addresses, on several important (if not all) DNS servers, Quad9, 1.1.1.1 and Google among them. No system reboot, access via HTTPS or DNS cache flush is going to help with this. The only options are to wait and/or to add the correct IP address to the hosts file as pointed out by Flint.
Regards
Dalai
Re: WDX plugin pdfOCR - Show details of PDF files
wtf? this page is displayed in 2 languages? All of a sudden English is switched for German? ????Dalai wrote: ↑2021-03-21, 13:11 UTC 2mgroen
Add the custom columns you need: https://www.ghisler.ch/wiki/index.php?title=Custom_columns
They don't magically appear.
Regards
Dalai