xPDFSearch 1.45 - Content plugin to search text in PDF files

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: Hacker, petermad, Stefan2, white

Post Reply
User avatar
nsp
Power Member
Power Member
Posts: 1912
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

xPDFSearch 1.41 - Cannot search in content

Post by *nsp »

I was using version 1.11 all was ok, i just switched to 1.41 but I have an issue with xpdfsearch 1.41, if i want to search for text in pdf. .

on tc9 it works on 32bit and not in 64bit.
on tc 11.03 none of the version works.

I search in background using plugin xpdfsearch.text contain "myString" the plugin seems to crash as the search windows is closed without notice.
The version 1.11 was working correctly on both 32 an 64bit.

I will raise a github issue.
User avatar
AntonyD
Power Member
Power Member
Posts: 1554
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: xPDFSearch 1.41 - Content plugin to search text in PDF files

Post by *AntonyD »

Hmmm.... Can't confirm.
TC+plug 64 bit. I do the search of one specific word with help of full Text search field
And I am getting the expected result in the form of one PDF file among all others.
the same for 32 bit.

Maybe some more specific conditions are still needed to reproduce the search+crash?
The word is actually not as simple as you indicated,
file name where you expect to find the result - with unusual(not a simple Latin) characters?
#146217 personal license
User avatar
nsp
Power Member
Power Member
Posts: 1912
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: xPDFSearch 1.41 - Content plugin to search text in PDF files

Post by *nsp »

the word is as simple as "increments" ..
AS said, i created an issue in github with somme additional info. the plugin folder is C:\tools\totalcmd\addon\xpdfsearch.
Internally xpfsearch seems to create a thread and it fails and then TC Crash !
User avatar
tuska
Power Member
Power Member
Posts: 4046
Joined: 2007-05-21, 12:17 UTC

Re: xPDFSearch 1.41 - Content plugin to search text in PDF files

Post by *tuska »

2nsp
I can find text content in PDF files with TC 11.03 x64 and x86 with plugin "wdx_xpdfsearch_1.41.zip":
Search in separate process... Alt+Shift+F7 > "Plugins" tab > ☑️ Search in plugins > xpdfsearch | Text | contains | increments

My installation folder is:
C:\totalcmd\Plugins\wdx\xPDFSearch\
i.e.
%COMMANDER_PATH%\Plugins\wdx\xPDFSearch\


Windows 11 Pro (x64) Version 23H2 (Build 22631.3672) | xPDFSearch 1.41
User avatar
AntonyD
Power Member
Power Member
Posts: 1554
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: xPDFSearch 1.41 - Content plugin to search text in PDF files

Post by *AntonyD »

2nsp
the word is as simple as "increments" ..
But what about the PDF-file where you do expect to find the "simple" word?
Did you try to open it and do the search via the int.mechanism of your favorite PDF viewer|editor?
Did you try to search another word in definitely another pdf|catalog|path?
#146217 personal license
zeeko
Junior Member
Junior Member
Posts: 56
Joined: 2009-02-21, 19:57 UTC

Re: xPDFSearch 1.41 - Content plugin to search text in PDF files

Post by *zeeko »

I've found a bug in GString::del() that might cause crash on text search.
New version will be released soon...
zeeko
Junior Member
Junior Member
Posts: 56
Joined: 2009-02-21, 19:57 UTC

xPDFSearch 1.42 - Content plugin to search text in PDF files

Post by *zeeko »

Version 1.42
ADDED
New fields: Number Of Fontless Pages, Number Of Pages With Images
Options in content plugin ini file:
[xPDFSearch] PageContentsLengthMin

CHANGED
Bugfix for crash in text search

https://github.com/tgotic/xPDFSearch/releases/download/v1.42/wdx_xpdfsearch_1.42.zip
User avatar
nsp
Power Member
Power Member
Posts: 1912
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: xPDFSearch 1.42 - Content plugin to search text in PDF files

Post by *nsp »

Digging a bit, shown that some pdf fiie seems to break the search.

/// 1.42 Fix the error ! :clap:
User avatar
AntonyD
Power Member
Power Member
Posts: 1554
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: xPDFSearch 1.42 - Content plugin to search text in PDF files

Post by *AntonyD »

2zeeko
from readme.MD help file:
"Number Of Number Of Pages With Images"
double impactnumer???

for RUS translation of 2 new variables:
Spoiler

Code: Select all

Number Of Fontless Pages=Число страниц без шрифта
Number Of Pages With Images=Число страниц с изображениями
P.S.
I would also like to clarify and understand about this expanding number of recognizable objects|fields (or what is the correct name for them?)
Why is this list growing? Is it really impossible to get all the fields from the PDF specifications at once, which can be extracted from
the PDF-file in order to informatively fill user's request with data? Or do you design them yourself, based on some internal capabilities
of the chosen PDF rendering engine?
So where did these two new variables about font and image pages come from, for example?
#146217 personal license
User avatar
petermad
Power Member
Power Member
Posts: 15997
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: xPDFSearch 1.42 - Content plugin to search text in PDF files

Post by *petermad »

Updated Danish translation for version 1.42: https://tcmd.madsenworld.dk/xPDFSearch_1.42_dan.zip
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
zeeko
Junior Member
Junior Member
Posts: 56
Joined: 2009-02-21, 19:57 UTC

Re: xPDFSearch 1.42 - Content plugin to search text in PDF files

Post by *zeeko »

2petermad
Thank you for your contribution.

2AntonyD
Thank you for your contribution.
Fields are designed by me, based on internal capabilities of xpdf engine.
Yes, this plugin has lot of fields because PDF document has many properties that might be of interest.

Two new fields, "Number Of Fontless Pages" and "Number Of Pages With Images" are inspired by pdfOCR plugin https://ghisler.ch/board/viewtopic.php?t=41504.
"Number Of Fontless Pages" reports number of pages in PDF that don't have Font resource. Such pages usually don't have searchable text and might be candidates for OCR.
"Number Of Pages With Images" reports number of pages in PDF that have Image objects. PDFs with scanned content usually have images on all pages.
If you are interested if PDF needs OCR, you can create new custom column:

Code: Select all

[=xpdfsearch.Number Of Pages]|[=xpdfsearch.Number Of Fontless Pages]|[=xpdfsearch.Number Of Pages With Images]
and quickly check which document is a OCR candidate.

There is also new parameter PageContentsLengthMin in xPDFSearch.ini. While counting for fontless pages, if PDF page Contents stream length is less than value specified in PageContentsLengthMin, page is considered as empty and not counted in. This procedure is also used for "Number Of Pages With Images".

Since pdfOCR has some problems with Unicode file names, xpdfsearch plugin can be used as an alternative to pdfOCR plugin.
zeeko
Junior Member
Junior Member
Posts: 56
Joined: 2009-02-21, 19:57 UTC

xPDFSearch 1.43 - Content plugin to search text in PDF files

Post by *zeeko »

v1.43

ADDED
  • Support for password protected PDF files, only detection and attributes extraction. Files are not decrypted and text cannot be extracted.
  • New fields: Protected
  • Options in content plugin ini file:
    [xPDFSearch] AttrProtected
CHANGED
  • Bugfix for conversion from PDF Time to FILETIME
https://github.com/tgotic/xPDFSearch/releases/download/v1.43/wdx_xpdfsearch_1.43.zip
User avatar
AntonyD
Power Member
Power Member
Posts: 1554
Joined: 2006-11-04, 15:30 UTC
Location: Russian Federation

Re: xPDFSearch 1.43 - Content plugin to search text in PDF files

Post by *AntonyD »

2zeeko
I would like to provide a translation for this a new single line : [Rus]:

Code: Select all

Protected=Защищён
#146217 personal license
User avatar
petermad
Power Member
Power Member
Posts: 15997
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: xPDFSearch 1.43 - Content plugin to search text in PDF files

Post by *petermad »

Updated Danish translation for version 1.43: https://tcmd.madsenworld.dk/xPDFSearch_1.43_dan.zip
License #524 (1994)
Danish Total Commander Translator
TC 11.51 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1391a
TC 3.60b4 on Android 6, 13, 14
TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
User avatar
popthezid
Junior Member
Junior Member
Posts: 14
Joined: 2024-11-23, 03:44 UTC

Re: xPDFSearch 1.43 - Content plugin to search text in PDF files

Post by *popthezid »

2zeeko
Chinese language translation:

Code: Select all

;Translation: 十年
[Chn]
Title=标题
Subject=主题
Keywords=关键词
Author=作者
Application=PDF生成程序
PDF Producer=PDF生成组件
Document Start=前1000个字符
First Row=首行
Extensions=扩展
Number of Pages=页数
Number Of Fontless Pages=页数(无字体)
Number Of Pages With Images=页数(包含图片)
PDF Version=PDF版本
Page Width=首页宽度
Page Height=首页高度
Copying Allowed=允许复制内容
Printing Allowed=允许打印
Adding Comments Allowed=允许添加注释
Changing Allowed=允许修改
Encrypted=已加密
Protected=密码保护
Tagged=有标签
Linearized=可显示第一页
Incremental=可增量修改
Signature Field=有签名栏
Outlined=有大纲
Embedded Files=有内嵌文件
Created=创建日期
Modified=修改日期
Metadata Date=元数据日期
ID=ID
PDF Attributes=PDF属性
Conformance=特定格式
Created Raw=创建日期Raw
Modified Raw=修改日期Raw
Metadata Date Raw=元数据日期Raw
Outlines=大纲
Text=文本
mm|cm|in|pt=毫米|厘米|英寸|点
lic#206911
Windows10 22H2 Home
Total Commander 11.50 32bit / 3.50d
Post Reply