WDX plugin pdfOCR - Show details of PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

User avatar
slavne
Junior Member
Junior Member
Posts: 10
Joined: 2008-09-18, 13:30 UTC
Location: Serbia
Contact:

WDX plugin pdfOCR - Show details of PDF files

Post by *slavne »

WDX plugin pdfOCR is intended to show the number of pages in a pdf file that need OCR processing. With the help of pdfOCR plugin, you can immediately spot which pdf files are unavailable for text search, either by you or by some indexing system. That is the purpose of needOCR column.

Next you have the Password column as well that will present "YES" if some of your pdf files are protected with password. Also your pdf can have some of rights restricted. In both cases the column Password will state "yes". That is good for people to know if some pdf files needs to be relieved from password before put to normal use. Also it is good to know if file is protected before you try to open pdf for OCR processing.

Finally the column Pages shows the total number of pages so you can compare the "needOCR" pages with total number and decide if it is worth of OCR processing.
Download: http://www.totalcmd.net/plugring/pdfOCR.html :arrow:
meepzorp
New Member
New Member
Posts: 1
Joined: 2014-12-20, 04:12 UTC

Post by *meepzorp »

love it. Thanks!
User avatar
slavne
Junior Member
Junior Member
Posts: 10
Joined: 2008-09-18, 13:30 UTC
Location: Serbia
Contact:

Post by *slavne »

meepzorp wrote:love it. Thanks!
You are welcome especially as the first user to say thanks :D
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Well, also thx from me!

But to get things clear:
It just counts pages that don't have text in them?
So even shortest snippets, like one letter/word in a page, makes them count as non-OCR?

I just think it would have been better to call such fields sth. like "non-Picture" or "text-contained pages",
because not every page that doesn't contain text is supposed to need OCR (This page [is] intentionally left blank ...)

2nd question: what pdf engine is used? Seems like CPDF Command Line Tools to me (krk.exe), am I right?
Any chance to link it statically (cause the source is available)?

I also suggest to call your analyzing procedure in background (return ft_delayed), because it gets really slow when I use custom columns.
TC plugins: PCREsearch and RegXtract
User avatar
slavne
Junior Member
Junior Member
Posts: 10
Joined: 2008-09-18, 13:30 UTC
Location: Serbia
Contact:

Post by *slavne »

milo1012 wrote:Well, also thx from me! But to get things clear:
It just counts pages that don't have text in them?
So even shortest snippets, like one letter/word in a page, makes them count as non-OCR?
Thank you, thank you, thank you!

The program counts the number of pages with no font detected. So it would be maybe better called "no-font pages". I chose the name of columns towards users who usually don't care much what's behind the scene but more what functionality they get - that is to say if one needs to do OCR processing. Anyway, user can easily name any column as he prefers.

I have no secrets in front of you: yea I use cpdf temporarily for this beta version to fulfill my need of preparation of a large pdf library that I collected for years (more decades...) and I was a bit surprised how it was put to good use in my case. It is done now and I am glad.
milo1012 wrote:Any chance to link it statically (cause the source is available)?
Good idea.
milo1012 wrote:I also suggest to call your analyzing procedure in background (return ft_delayed), because it gets really slow when I use custom columns.
To be honest this is my very first plugin ever, and first C++ program after maybe 15 years; I still have to figure out not only the C++ but more how to work with that TC plugin thing . By the way the plugin can be accelerated considerably even without any of previous, but I have to find some time to make the next version. People have been downloading this plugin about 30 per day and yet previous reactions are the first to come. We shall see how many will be in the future.

Thanks for the suggestions, they are awesome!
xkxtnt
Junior Member
Junior Member
Posts: 5
Joined: 2016-01-19, 19:21 UTC

Love the idea of the plug in

Post by *xkxtnt »

Unfortunately it is extremely slow, and caused TC to be unresponsive.
User avatar
slavne
Junior Member
Junior Member
Posts: 10
Joined: 2008-09-18, 13:30 UTC
Location: Serbia
Contact:

Re: Love the idea of the plug in

Post by *slavne »

xkxtnt wrote:Unfortunately it is extremely slow, and caused TC to be unresponsive.
Really sorry for that, I know the problem. The plugin needs to be improved but I have no time for that. I recommend somebody does the effort of making the good improved plugin for similar purpose.
mgroen
Junior Member
Junior Member
Posts: 45
Joined: 2018-08-28, 12:04 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *mgroen »

slavne wrote: 2014-12-10, 20:38 UTC WDX plugin pdfOCR is intended to show the number of pages in a pdf file that need OCR processing. With the help of pdfOCR plugin, you can immediately spot which pdf files are unavailable for text search, either by you or by some indexing system. That is the purpose of needOCR column.

Next you have the Password column as well that will present "YES" if some of your pdf files are protected with password. Also your pdf can have some of rights restricted. In both cases the column Password will state "yes". That is good for people to know if some pdf files needs to be relieved from password before put to normal use. Also it is good to know if file is protected before you try to open pdf for OCR processing.

Finally the column Pages shows the total number of pages so you can compare the "needOCR" pages with total number and decide if it is worth of OCR processing.
Download: http://www.totalcmd.net/plugring/pdfOCR.html :arrow:
download link is dead. Any alternative download link?
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *Usher »

2mgroen
Try again with HTTPS link: https://www.totalcmd.net/plugring/pdfOCR.html
If you still have problems, wait a day, restart your system to refresh DNS cache and try once again.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
Dalai
Power Member
Power Member
Posts: 9364
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *Dalai »

2Usher
totalcmd.net currently points to the wrong IP addresses, on several important (if not all) DNS servers, Quad9, 1.1.1.1 and Google among them. No system reboot, access via HTTPS or DNS cache flush is going to help with this. The only options are to wait and/or to add the correct IP address to the hosts file as pointed out by Flint.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *DrShark »

mgroen wrote: 2021-03-13, 16:49 UTCdownload link is dead. Any alternative download link?
If you can edit the hosts file, you can get back the access to totalcmd.net and wincmd.ru (where the plugins are actually hosted) in your web browser with an advice from the post https://ghisler.ch/board/viewtopic.php?p=397562#p397562 (do the record for the same IP in the hosts file for wincmd.ru too).

Or, temporarily (while totalcmd.ru and wincmd.ru domains are not accessible), you can use their "preview" domains: on totalcmd.net's preview domain, open the plugin page (for this plugin it will be http://xhmhk.hosts.cx/plugring/pdfOCR.html), then copy donwload link and change the "xhmhk" domain name part there to wincmd.ru's "preview" one, "ob9gr".
This way, for pdfOCR plugin the download link will be http://ob9gr.hosts.cx/download.php?id=pdfOCR; or, if you can get a copy of a direct link to a file, which is shown in a tooltip over "Download" link on the plugin page, then you can change that link the same way, so for this plugin it will be: http://ob9gr.hosts.cx/files/9924358/wdx_pdfOCR_0.9.rar.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
mgroen
Junior Member
Junior Member
Posts: 45
Joined: 2018-08-28, 12:04 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *mgroen »

I waited a couple of days, I downloaded the file wdx_pdfOCR_0.9.rar

But now??
I double clicked on the rar file, TC asked me to install the plugin, I did.

Then I restarted TC ,
and moved to a folder which contains pdf files,

but no columns are displayed like: "pages", "Need OCR" etc.

I use TC 9.51 64bit.

Any tips/info on how to proceed?
User avatar
Dalai
Power Member
Power Member
Posts: 9364
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *Dalai »

2mgroen
Add the custom columns you need: https://www.ghisler.ch/wiki/index.php?title=Custom_columns
They don't magically appear.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
mgroen
Junior Member
Junior Member
Posts: 45
Joined: 2018-08-28, 12:04 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *mgroen »

Dalai wrote: 2021-03-13, 21:49 UTC 2Usher
totalcmd.net currently points to the wrong IP addresses, on several important (if not all) DNS servers, Quad9, 1.1.1.1 and Google among them. No system reboot, access via HTTPS or DNS cache flush is going to help with this. The only options are to wait and/or to add the correct IP address to the hosts file as pointed out by Flint.

Regards
Dalai
how is it possible that totalcmd.net points to the wrong IP address?
mgroen
Junior Member
Junior Member
Posts: 45
Joined: 2018-08-28, 12:04 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *mgroen »

Dalai wrote: 2021-03-21, 13:11 UTC 2mgroen
Add the custom columns you need: https://www.ghisler.ch/wiki/index.php?title=Custom_columns
They don't magically appear.

Regards
Dalai
wtf? this page is displayed in 2 languages? All of a sudden English is switched for German? ????
Post Reply