WDX plugin pdfOCR - Show details of PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: Hacker, petermad, Stefan2, white

User avatar
petermad
Power Member
Power Member
Posts: 15997
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

[OT] WDX plugin pdfOCR - Show details of PDF files

Post by *petermad »

2Usher
but they might be disabled globally when the board software was updated
My profiles still has it enabled after board software updates.
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
Usher
Power Member
Power Member
Posts: 1726
Joined: 2011-03-11, 10:11 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *Usher »

2white
You are right, the problem with ban per user may still exist. There is an option in user profile to define friends and foes, but:
phpBB wrote:Private messages from foes are still permitted.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
RalfTC
Junior Member
Junior Member
Posts: 23
Joined: 2016-10-27, 06:57 UTC
Location: Lüneburg
Contact:

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *RalfTC »

Ah, just found the settings options in my profile 😀Thx to all! 👋
PM is underway.
User avatar
Usher
Power Member
Power Member
Posts: 1726
Joined: 2011-03-11, 10:11 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *Usher »

2RalfTC
I've got it, waiting for the package ;-)
Andrzej P. Wozniak
Polish subforum moderator
User avatar
Usher
Power Member
Power Member
Posts: 1726
Joined: 2011-03-11, 10:11 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *Usher »

2RalfTC
Well, your "document" is just a copy of a web page printed to PDF using NitroPDF software (either as a browser plugin or as a virtual printer). It contains default text header (page title, empty, page url) and default text footer (page No of Total, empty, print timestamp). The web page for unknown reason is saved as a picture with screenshot in it. I haven't tested NitroPDF - maybe it's another default setting.

As you can see, it's not an original invoice printed to PDF from a billing program. It's not even a good screenshot - the right side of the invoice is cut off. However, this PDF contains both text and picture on a single page so the pdfOCR plugin properly shows needOCR=0 as a number of pages containing only pictures.
Andrzej P. Wozniak
Polish subforum moderator
gammabubble
Junior Member
Junior Member
Posts: 3
Joined: 2024-05-28, 16:23 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *gammabubble »

The plugin was working perfectly on my Windows 11 system but suddenly broke. No major changes have been made to the system. It now shows all PDF files with need OCR as -71 and total pages as -4.

I uninstalled completely and reinstall Total Commander as well as removed/reinstall plugin several times but it still shows the same negative values.

Is there anything else anyone can suggest? Thanks.
User avatar
tuska
Power Member
Power Member
Posts: 4046
Joined: 2007-05-21, 12:17 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *tuska »

gammabubble wrote: 2024-05-28, 16:27 UTC Is there anything else anyone can suggest?
Well, ...

cpd.bat
::verz 1: ne radi sa unicode imenima
; ::verz 1: does not work with unicode names
pdfOCR 0.9 wrote:Limitations:
- Unicode file names – in this version they are not supported, so please use only ANSI names.
  If non ANSI names are used the numbers of pages will be negative or very high number.
- Speed – plugin is relatively slow, so when you activate this plugin in a panel of Total Commander
  please be patient until the analyzing is finished and you get your cursor ready again.
pdfOCR 0.9 wrote:Bugs:
negative page numbers or very high page numbers: that usually happen if pdf is not properly formatted.
In that case the following procedure is suggested to try:
1) open the pdf file in any pdf reader that can read pdf and re-save the pdf file
2) rename the offending pdf file temporarily with active plugin to force it to reread it.
⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
Example
2019-11-01.pdf   renamed to: ...

Name/Ext.
✅_2019-11-01.pdf

[=pdftrebaocr.totalPages]          -> totalPages    -3
[=xpdfsearch.Number of Pages] -> totalPages: 116   xPDFSearch 1.41 - Content plugin to search text in PDF files

UNICODE character ✅ results in totalPages -3 in plugin pdfOCR 0.9.

After a renaming, I also noticed that with a negative value in "totalPages"
the content of column "needOCR" was changed from 1 to 0.
A TC restart (cm_exit 9) brought the same result after applying the 'Custom Columns view'.

Furthermore, when this pdf file was renamed back in the 'Custom Columns view' in the next(!) (underlying) file,
both the content of the "totalPages" column was changed from 148 to -3 as well as the content of the "needOCR" column
from 0 to 1, although this PDF file has NOT been renamed and NO UNICODE character was present in the file name!

Only a TC restart with renewed application of the 'Custom Columns view' caused the data to be corrected.

This example confirms that the plugin "pdfOCR 0.9" does NOT work with UNICODE file names!

In a directory with 80 PDF files, the values 0, 1, 2, 3, 4, 6 were displayed in the "needOCR" column.
I was able to search those PDF files with a value >0 for text without any problems (random tests carried out).
SumatraPDF v.3.5.2 64-Bit

⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺
Plugin: pdfOCR 0.9 | pdftrebaocr
totalPages[=pdftrebaocr.totalPages]
needOCR[=pdftrebaocr.needOCR]
password[=pdftrebaocr.password]

⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺⸺

My conclusion:
In its current version, this plugin is not suitable for me.
gammabubble
Junior Member
Junior Member
Posts: 3
Joined: 2024-05-28, 16:23 UTC

Re: WDX plugin pdfOCR - Show details of PDF files

Post by *gammabubble »

Thanks @Tuska for the feedback.

If the pdfOCR plugin crashes for some reason, Total Commander won't launch it even after you close and re-open Total Commander. I had completely reboot the computer and then re-launch Total Commander which got the pdfOCR working again. But this worked for few times but the issue now returned back with needOCR value of -71 and total pages as -4 on all the PDFs. Reboot of computer is not helping this time again.

I wonder if cpd.bat or other needed file is being blocked by Windows Defender Smartscreen somehow which is causing the issues here.
Post Reply