Like Skif_off said, the goal here is to sort files "on-the-fly" in a custom column view, i.e. without actually renaming them in MRT.
That's what regexp_wdx does, as far as I understand.
It actually might come in handy in certain situations.
The filter will be optional of course (and you need to download it yourself, like in uLister, due to license restrictions), you can still use xdoc2txt as an alternative filter.
The filter is compatible at least down to Windows 2000 (32-bit) and uses the very same same runtime libs since at least 2010 (when I first heard about them). I don't see this to change in future versions, so there's no need to worry about compatibility issues for now (and also there's no big need for constant updates, as new office file formats are rare nowadays; my four year old DLL download still works for my set of office files).
I ran into some problems when using xdoc2txt without installing the Visual C++ 2008 Redistributable (xdoc2txt is not working). It was found that the problem in Microsoft.VC90.CRT.manifest and I replaced it with
Code: Select all
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <!-- Copyright (c) Microsoft Corporation. All rights reserved. --> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> <noInheritable/> <assemblyIdentity type="win32" name="Microsoft.VC90.CRT" version="9.0.21022.8" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b" /> <file name="msvcr90.dll" /> <file name="msvcp90.dll" /> </assembly>
Yes, I already saw that when I released PCREsearch 2.1, that's why I kept xdoc2txt 2.11 back then.Skif_off wrote:xdoc2txt >= 2.12 requires Visual C++ 2010 Redistributable, and without installing it needs three files:
The cause of this is that the author of xdoc2txt switched to Visual Studio 2010 with version 2.12, but forgot to update the embedded manifest for the 2010 runtime DLLs.
So you end up with the xdoc2txt.exe actually needing the 2010 runtime DLLs, while the manifest says it still needs the 2008 runtime DLLs.
And so of course it won't work portable that way, as Windows doesn't find the fitting runtime DLLs, unless you install the 2010 DLL retail package.
I will try to figure out how you can patch the manifest to use the 2010 DLLs portable, but to be honest: there is no real reason to update xdoc2txt from version 2.11, as there were only a few minor changes.
see changelog in translation, e.g.
Turns out that it's actually quite easy.
Just download the Visual C++ 2010 Redistributable Package (x86)
from it and copy it to the same dir as xdoc2txt.
Now use some hex editor and modify the manifest embedded in xdoc2txt. You can find it at the very end.
Code: Select all
<dependency> <dependentAssembly> <assemblyIdentity type="win32" name="Microsoft.VC90.CRT" version="9.0.21022.8" processorArchitecture="x86" publicKeyToken="1fc8b3b9a1e18e3b"> </assemblyIdentity> </dependentAssembly> </dependency>
Just make sure not to add or remove bytes from the exe file (the file size must stay the same)
Xoc2txt 2.12 and newer should now work portable, starting from Windows XP.
And what about thismilo1012 wrote:there is no real reason to update xdoc2txt from version 2.11, as there were only a few minor changes.
?2.14 Fixed an issue where the abnormal termination in part of the PDF
Author even fixed the old 1.xx. (I have not seen such a mistake and I have not updated, but I rarely used it.)
Of course you should update every now and then, no doubt about that.
All I wanted to say is that you don't need to install every new version immediately, especially when you need to patch new versions (like in this case to be portable), plus we don't know the exact details of changelogs.
I had my share of experience with programs that were rock stable, but became flawed in newer versions, despite saying that things were fixed.
And that wrong embedded manifest in xdoc2txt shows exactly that: it's a new error source, despite fixing other things.
Concerning that fix you quoted:
I just did a quick comparison with xdoc2txt 2.16 VS 2.11, and didn't experience any difference, neither in output nor in stability.
I used a set of ~200 PDF files, which consists of all sorts of types (E-Books, technical documents, PDFs with input forms, with multimedia elements, security restrictions, etc...).
Stability means: a few PDFs in my collection crash xdoc2txt for unknown reason, but they still crash in the newest 2.16, so the fix didn't affect the collection at all.
Anyway, the author should fix that manifest, otherwise I won't support the newer versions for the plug-in.
- new major feature: optional Oracle Outside In Technology Content Access filters
- works for nearly all file formats for: Word Processors, Spreadsheets, Presentation programs, XML based data, Database files; plus will search some embedded files
- you may now choose between xdoc2txt and the OiT filters as a text filter for specific file extensions
- when installed and working, will provide an additional powerful Unicode capable fulltext search for TC 9.0 and above (on top of the text filter capability for the normal plug-in operation)
- option to exclude certain file formats for the fulltext search and to filter unknown files for text
- path for the filter DLL files is freely configurable and has a separate configuration for the 32-bit and x64 filters
- the filters need to be downloaded separately from the Oracle site (you need to register, which is free, though you might find a way to prevent it - hint: b**me*ot)
- needs an additional runtime package in order to work (Visual C++ Redistributable)
- can share nearly all files with the uLister plug-in Viewer package (when using the same versions)
- new major feature: search in the filename only
- you can use the same type of fields as for content search: boolean, count, string assembly, average length
- may be useful to quickly preview purified filenames in TC's custom columns, or checking for names containing specific characters, e.g. to check for non-ASCII filenames and similar
- will always use the name including the extension
- can be used to to check for otherwise identical files (same size and possibly date) differ in their filenames in case but still being treated and seen as identical by TC and Windows
- compare files in TC's 'Synchronize dirs' can now work with the OiT filter (still doesn't work with xdoc2txt), to compare e.g. the content of two office file versions with a custom RegEx
- when comparing two files with the same encoding case in-sensitive in TC's 'Synchronize dirs, the results are now allowed to differ in length, to take Unicode normalization into account
- fixed a possible out-of-bounds memory access when comparing files in TC's 'Synchronize dirs' case sensitive
- a few code optimizations
- supdated to pcre 8.39
"In TC 9.0 and above you will now have an additional field Oracle Outside In fulltext search in the search dialog (Alt+F7)"
I don't see such a field in the plugins search dialog for pcresearch.
I have installed the content filters in the same dir as the ULister plugin files
and the path to it added in the ini file.
The required Run-time libs are installed because the Ulister plugin uses them without any problems.
Environment is TC 9.0b3 x64 under Windows 10
TC 10.00ß9 x64 / x86
Everything 22.214.171.1248a (x64)
I agree that the UI (the config tool) is not exactly optimal, but that's due to the contant feature adding in the past and the plugin was originally planned w/o any UI whatsoever. I'm planning to make it more concise/accessible for a long time now, but as with my other plugins: I can't say when I'll have the time for it.
- Junior Member
- Posts: 5
- Joined: 2009-08-19, 17:34 UTC
- Location: Los Angeles, California
Then I opened the Plugins Tab, and (as this image Image: https://tinyurl.com/yzc86tko shows), in the Plugins tab I went ahead and:
1. Clicked the "Search in plugins" box;
2. Selected the "AND" radio button which I do not understand; and
3. Selected "pcresgareh" in the Plugin column.
Next, I formulated this regular expression search in RegexBuddy, using the PCRE 8.39 UTF-32 flavor regular expressions.
This image Image: https://tinyurl.com/yzc86tko shows that the regular expression matches the highlighted files located in the directory:
Lastly, I selected the General tab in TC, and then I clicked the "Start search" button
But as this image Image: https://tinyurl.com/yzc86tko reflects, TC's "Search results" window shows "[No files found]"
So, my question is what do I need to do differently, so that I can search for files in that directory (E:\Apps\UtilitiesByMarc) using the regular expression below?
((.+?)?(?=\.vbs) ) ( .vbs)$
I think you have a few extra spaces in the expression.
Anyway, you're probably referring to this post:
While PCREsearch may be used for searching in filenames only (it is primarily intended for file content), it can be quite clumsy, as for every configuration change you have to restart TC or use the TC command
In any case, for PCREsearch to work, do the following:
In the location where the plug-in is installed, there is a tool PCREsearchConfig.exe. Start it.
- Now from the very left field list choose a field which you want to override, or increase the counter on the top left for adding additional field(s) (fields available to TC are identified by the prefixed "-->")
- Mark your choosen field in the left list
- Enter your RegEx in the "Regular Expression" box
- In the "Field type" area, choose "Boolean"
- In the "Field flags / options" area check "Search in filename only"
- Finally name your field, so that you can later identify it in TC
- Hit "Apply" and close the config tool
- Now restart TC (or type cm_UnloadPlugins in TC's command line)
In TC's search dialog go to "Plugins" and choose:
pcresearch -> <your field name> -> = -> Yes
Start the search. TC should now find all files that match your expression.
I've found that PCREsearch don't read locked file... it's basically read buffer and and do string search, no danger in opening those files in read-only mode... or at least gives user an option to read the opened file
I assume you mean that these "locked" files are opened with FILE_SHARE_READ|FILE_SHARE_WRITE|FILE_SHARE_DELETE and similar combinations
In PCREsearch I open files with
Code: Select all
But sure, I could add another field flag/option to open such files anyway (with the risk of inconsistent results) to the next plug-in version, maybe depending on if the file fits completely into the first read buffer.