new: The Thousand Types plugin

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: Hacker, Stefan2, white

moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

new: The Thousand Types plugin

Post by *moisescastellano »

Have you ever being checking what's inside of a lot of documents such as PDFs or .doc files,
spending a lot of time waiting for a new Acrobat Reader, MS-Word or Whatever-program to open, just to close it and continuing the process?

Have you ever wondered what a particular file contained and wanted to take a look at its contents, not having the associated application to open it?

This plugin allows TC to show a very quick text preview of almost every file format. It comes in two flavors: packer and lister plugin.

The Thousand Types packer plugin
Packer plugin allows Total Commander to very quickly "enter" (ctrl + pgDown) docs as if they were archives or folders.
In that "simulated folder" you can see at a glance:
- a plain-text preview (or whole contents) of the document, that you can then view (F3) or extract (F5)
- document metadata: author, version, creator tool...
- first lines preview of the contents shown as file names, so that you dont even have to open any file in the folder

This info is shown as file names, so that you can have a very quick preview, and then if needed extract to a file or view the complete document's metadata or contents.

Screenshot

The Thousand Types lister plugin (coming soon)
Lister plugin shows (F3) the text of the document in the integrated Total Commander Lister, even if you dont have the corresponding appplication.

Note: The first time you open a doc with a Java plugin, it will take a couple seconds as the JVM loads into memory;
next times the preview is as quick as entering a folder in the local system

FAQ
A thousand formats is an exaggeration, right?

No, it is for real.
The plugin is based on [Apache Tika](https://tika.apache.org/), a toolkit that detects and extracts metadata and text from over a thousand different file types.
Tika has also translation capabilities, to be incorporated in upcoming versions of the plugin.

Configure it based on your preferences

**(Configuration coming soon)**
You can [easily configure](how-to-configure.md) the plugin to show or hide every element in the "simulated folder", and how is it presented.

This configuration can be done globally or per specific format.

Why can I only see PDF files associated to the plugin?

By default the plugin comes just associated to PDF files, for two reasons:
- AFAIK, the Total Commander packer plugin **installation** process only let it associate to one extension
- All Apache Tika parsers size is over 50 MBs. So the plugin is distributed just with the PDF parsers and Tika core libraries

Don't panic! TC lets you associate more extensions to the plugin and you can easily [download and configure all the Tika parsers](how-to-configure.md).
**(How to configure thousand types coming soon)**

Download and resources
- Download the [latest release in this project](https://github.com/moisescastellano/thousandTypes-tcplugin/blob/main/releases)
- [Plugin page at totalcmd.net](http://totalcmd.net/plugring/thousand_types.html)
- [Github page](https://moisescastellano.github.io/thousandTypes-tcplugin/)
- [Github project](https://github.com/moisescastellano/thousandTypes-tcplugin)
- This is a work in progress, you can help with [things to do](https://moisescastellano.github.io/thousand-preview-plugin/to-do)

Troubleshooting guide
https://moisescastellano.github.io/tcmd-java-plugin/troubleshooting
This interface and all derived plugins are written in Java, so you need to have installed a [Java Runtime Environment (JRE)](https://www.java.com/en/download/manual.jsp). The Java plugin interface and derived plugins were tested on **Oracle (Sun) JRE 1.8** (jre-8u311-windows-x64.exe).

In case you have any of the following issues, refer to the Troubleshooting guide
- In case you have more than one Java plugin installed
- Be sure you use the same (32/64) platform for JVM and TC
- In case you have both TCx64 and TCx32 installed
- Error *Java Runtime Environment is not installed on this Computer*
- Error *LoadLibrary Failed*
- Error *Starting Java Virtual Machine failed*
- Error *Class not found class='tcclassloader/PluginClassLoader'*
- Error *Initialization failed in class...*
- Error *Exception in class 'tcclassloader/PluginClassLoader'*
- Error *Access violation at address...*
- Error *Crash in plugin ... Access violation at address...*]

For other issues you can open a project issue or contact me - see next paragraphs.

Issues and things to-do
This is a work in progress. **Help wanted!** - in particular with Visual C++ issues.
- Refer to [things to do](https://github.com/moisescastellano/thousandTypes-tcplugin/blob/main/to-do.md) for work in progress.
- Check also the [issues page](https://github.com/moisescastellano/thousand-preview-plugin/issues) for this plugin.
- Java Plugin Interface's [issues page](https://github.com/moisescastellano/tcmd-java-plugin/issues).

Contact
If you have any comment, suggestion or problem regarding this java plugin,
you can contact me at:
- This TC forum thread
- email: moises.castellano (at) gmail.com
- Github project issues page

Please detail the specific version of: Java plugin interface, Total Commander and JRE that you are using.
Last edited by moisescastellano on 2022-01-29, 10:58 UTC, edited 1 time in total.
User avatar
Horst.Epp
Power Member
Power Member
Posts: 5102
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: new Plugin: The Thousand Types viewer

Post by *Horst.Epp »

what should be the benefit compared to the ULister plugin with the Oracle viewers ?
Ulister displays almost any file format in high quality.
l
Windows 11 Home x64 Version 21H2 (OS Build 22000.832)
TC 10.50 x64 / x86
Everything 1.5.0.1315a (x64)
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new Plugin: The Thousand Types viewer

Post by *moisescastellano »

2Horst.Epp:

Thanks for your comment, TBH I did not know about uLister.
I tried it and the first thing I have to say is it seems a great and very useful plugin.

The most obvious advantage of uLister over ThousandTypes is that it displays documents with formatting and graphics, while ThousandTypes (at least for now, and in particular the package plugin version) displays plain text.

However there are some use cases for ThousandTypes, which I present below.

Note: I'm not trying to denigrate uLister at all, I just think ThousandTypes has some advantages for particular cases. Also note an user can install and use both uLister (F3) and package version of ThousandTypes (ctrl + pgDown) to have the better of both.

- ThousandTypes plugin is aimed to provide a very quick preview of documents, for instance in case you are reviewing a lot of documents. By quick I mean almost instantly, under a second per document for opening and closing (except the first document, where the JVM is loaded which takes a couple seconds).

With uLister, at first opening was quick, but then after I opened a couple PDFs not particularly big (around 10 MBs), closing the lister window took more than 20 seconds! Checking in the task manager, TC process went from around 50 MB to over 400 MBs.

- For using uLister you have to accept license and register into an Oracle account, then download Oracle's "Viewer Technology".
ThousandTypes is based on Apache Tika: open source and freely redistributable.

- uLister says it supports more than 500 file formats, but in the list there are only around 200; not bad, but as a lot of them are graphic formats, the list of text document types should be even lower. The first type I tried after PDFs is EPUB, which uLister does not support.

- ThousandTypes is (will be, in coming version) configurable: e.g. you can choose which metadata you see, how you want the preview presented, etc.

- ThousandType can (will, in coming version) translate documents to other languages on the fly, as it is a feature provided by Apache Tika.

Anyway, as I being the author can be biased, please let me know what you think (especially if you've tried both plugins).
User avatar
nsp
Power Member
Power Member
Posts: 1687
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: new Plugin: The Thousand Types viewer

Post by *nsp »

Horst.Epp wrote: 2022-01-25, 17:16 UTC what should be the benefit compared to the ULister plugin with the Oracle viewers ?
Ulister displays almost any file format in high quality.

I know the feature as i'm using the java libraries on servers for document analysis.

The word view is not appropriate as the wcx java plugin is not a viewer but a powerful extractor. it extract document/archive content and structure as fast as possible. it is based on Apache TIKA a content analysis toolkit.


uLister is a pure viewer and show document with all the needed formatting.

In fact they do not compete in the same area !
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new Plugin: The Thousand Types viewer

Post by *moisescastellano »

In fact they do not compete in the same area !
I think you are right, also they are compatible: as I said an user can install and use both uLister (F3) and package version of ThousandTypes (ctrl + pgDown) to have the better of both plugins.

I agree viewer may not be an appropiate word (any suggestion? maybe previewer?) - I just used it in this post, because I was also planning to develop a lister (wlx) version of the plugin - after learning about uLister I am not sure its worth the effort as I find uLister superior for that function. I will focus on improving the packer (wcx) version.
User avatar
nsp
Power Member
Power Member
Posts: 1687
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: new Plugin: The Thousand Types viewer

Post by *nsp »

moisescastellano wrote: 2022-01-26, 11:11 UTC I agree viewer may not be an appropiate word (any suggestion? maybe previewer?) - I just used it in this post, because I was also planning to develop a lister (wlx) version of the plugin - after learning about uLister I am not sure its worth the effort as I find uLister superior for that function. I will focus on improving the packer (wcx) version.
Hi Moises,
extractor is for me the most TC compliant, you can also call it Tika wrapper or whatever...
On the wcx, you could also define some default translation language in a configure pane and make it compliant with many file types...

Tika is great as extracting text, getting metadata, detecting file types, detecting languages which make it also a very good candidate for a content plugin (wdx) specially because it support a lot of File Type.

If you want to make a lister, you could just create a tree with the structure and embedded resources like for the archive viewer (like inside ulister or old arcview) or the ctrl+pdown.. and redirect to normal TC lister to view. You can add if you want the ability to translate from a menu or a bar..
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new Plugin: The Thousand Types viewer

Post by *moisescastellano »

nsp wrote: 2022-01-26, 13:34 UTC Hi Moises,
extractor is for me the most TC compliant, you can also call it Tika wrapper or whatever...
Thanks for the ideas, by now I just updated the post title to "The Thousand Types plugin", no qualifiers :wink:
nsp wrote: 2022-01-26, 13:34 UTC On the wcx, you could also define some default translation language in a configure pane and make it compliant with many file types...
I am making the plugin configurable, by now via a yaml file, in the next version. Translation capabilities will be added in a later version.
nsp wrote: 2022-01-26, 13:34 UTC Tika is great as extracting text, getting metadata, detecting file types, detecting languages which make it also a very good candidate for a content plugin (wdx) specially because it support a lot of File Type.
I was also planning to create a content plugin version, just to check the capabilities of the java interface.
nsp wrote: 2022-01-26, 13:34 UTCIf you want to make a lister, you could just create a tree with the structure and embedded resources like for the archive viewer (like inside ulister or old arcview) or the ctrl+pdown.. and redirect to normal TC lister to view. You can add if you want the ability to translate from a menu or a bar..
Thanks for all the ideas!
User avatar
nsp
Power Member
Power Member
Posts: 1687
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: new: The Thousand Types plugin

Post by *nsp »

nsp wrote: 2022-01-26, 13:34 UTC I am making the plugin configurable, by now via a yaml file, in the next version. Translation capabilities will be added in a later version.
I do not see any yaml file in current github repo ;)
I have in another tika installation a tika-config.xml where for PDFParser enable inline images extraction.

Code: Select all

<parser class = "org.apache.tika.parser.pdf.PDFParser">
<params>
            <param name = "extractInlineImages" type = "bool">true</param>
</params>
</parser>
But i do not see any image in the structure when using your plugin if i copy my tika-config file...
Adding README.MD is each extracted pdf in not useful IMO.
I know it is a work in progress but keep it in mind.
Also it would be great to be able to use full tika-app-2.x.y.jar instead of the multiple jar files as it will make update easier unless you add your personal document parser ;)
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new: The Thousand Types plugin

Post by *moisescastellano »

I do not see any yaml file in current github repo ;)
I have just uploaded the "development" and "feature-1.1" branches to honor your interest :)
Almost everything regarding reading and using that yaml configuration is done, except for the (most boring part of) documenting how to modify it, but you can figure it out.
I have in another tika installation a tika-config.xml where for PDFParser enable inline images extraction.

Code: Select all

<parser class = "org.apache.tika.parser.pdf.PDFParser">
<params>
            <param name = "extractInlineImages" type = "bool">true</param>
</params>
</parser>
But i do not see any image in the structure when using your plugin if i copy my tika-config file...
I did not know about the image extraction capabilities of Tika. Good to know. I noted it down for a later version.
Adding README.MD is each extracted pdf in not useful IMO.
I know it is a work in progress but keep it in mind.
This is also configurable with the code in the development branch.
In fact I received an mail also complaining about that help file (I will be posting that email) - so I think it will be removed by default, as apparently is annoying.
Also it would be great to be able to use full tika-app-2.x.y.jar instead of the multiple jar files as it will make update easier unless you add your personal document parser ;)
Yes, the problem is that jar is around 55 MBs. As version 1.0 is not configurable, I just included the PDF parsers (15 MB). I don't like the idea of making a 50+MB plugin zip, so I was planning documenting instructions for downloading the Tika jar for people configuring other formats. And maybe in a later version make that download automatic via plugin code. Any ideas?
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new: The Thousand Types plugin

Post by *moisescastellano »

This is an email I received a couple days ago - I think is the public interest (for this plugin - thread) and the sender is ok with me posting it here. Find my responses below.
Hello!
Your plugin ThousandTypes very interesting.
I want to offer a few changes:

1. Plugin not support Russian language (filename and content «archive» PDF — *.line)
2. Files *.metadata:
Filename is Value of properties, extension is key with prefix «meta_». For example:
«2019-04-25T05-35-36Z.meta_created»
«AnyAuthor.meta_creator»
«2019-05-30T04-50-36Z.meta_modified»
«Adobe PDF Library 11-0.meta_producer»
Is easy to sort by extension for convenient view.
3. Plugin not support copy filename in «archive» PDF. It would be convenient to copy the file properties inside the PDF as a file name (Shift+F6).
4. Files *.line essentially not needed.If I want to view content PDF, then I using other plugin (slister+sumatraPDF). Enough one of file «filename.txt» with text content of PDF.
5. Files README.md not needed, because it already have in plugin directory.
6. May be use this plugin as WDX (column of properties), WLX (only properties)?
7. What other file types are supported, except PDF?
8. Is it possible to make less plugin size (15MB)?[/list]
Hi, thanks for your suggestions!

1. Yes, I know, I have to work on this: for the *.line - I have restricted a lot the characters shown to avoid problems, but in coming versions I will loose those restrictions. For the contents, I have to check how to parse the files in Unicode
2. In next version every file name will be configurable, so you will be able to specify, in your example:
itemName: "metadata\\%NUMBER%. %VALUE%.meta_%NAME%"
3. You can copy the file name to the command line with Ctrl+Enter (then I usually go with right arrow for selecting all, ctrl+C). You can also open the .metadata file (F3) and copy from lister. Shift+F6 is for renaming and that function makes no sense here.
4. *.line are just a facility to quickly know what is inside the documents, without needing to open the contents file. If you dont like them, in next version you will be able to remove these files or move them all to a directory, via the configuration mentioned in point 2.
5. same as 4.: just a facility you will be able to remove via config. Maybe removed by default in next version, or moved to a help dir inside the archive.
6. WDX: I wanted to develop a WDX just for testing, if you find that useful I can do it with that functionality
WLX: I was planning to develop a lister version of the plugin, but I knew about uLister plugin and now I am not sure about that
7. Every file format supported by Apache Tika: https://tika.apache.org/2.2.1/formats.html
So essentially every standard format.
Now only the PDF parsers are included in the plugin distribution.
I have yet to document how to download and install all the Tika library parsers, so if you are not familiar with Tika you better wait for this doc. Also I am planning to automatize download/installation of these libraries
8. I will check if some library is not needed. But 15 MB is not so much these days, is it? Are you asking it for reducing the plugin installable file? - in that case I could do a version where the PDF libraries are downloaded on demmand.
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new: The Thousand Types plugin

Post by *moisescastellano »

Hi, I have a couple of questions about the Packer plugin interface:

1.- Since the ThousandTypes plugin aims to "enter" (via ctrl+pgDown) various file formats as if they were archives,
Can I associate multiple extensions to a single plugin by default?
In the lister plugin interface there is a function "listGetDetectString" that allows to return several extensions,
but in the package plugin interface AFAIK this is done via the "defaultextension=pdf" property within the pluginst.inf file, which only allows one extension.

2.- ThousandTypes plugin can parse and display MSOffice documents (eg docx, xlsx); however, since these files are in fact .zip files,
even if the user manually associates these extensions to the plugin, ctrl+pgDown will bring them in as zip files showing the internal structure,
instead of letting the plugin manage them and display the contents.
Is there a way to change this behavior, and allow the plugin to manage these files?
User avatar
nsp
Power Member
Power Member
Posts: 1687
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: new: The Thousand Types plugin

Post by *nsp »

moisescastellano wrote: 2022-02-06, 19:46 UTC Hi, I have a couple of questions about the Packer plugin interface:

1.- Since the ThousandTypes plugin aims to "enter" (via ctrl+pgDown) various file formats as if they were archives,
Can I associate multiple extensions to a single plugin by default?
In the lister plugin interface there is a function "listGetDetectString" that allows to return several extensions,
but in the package plugin interface AFAIK this is done via the "defaultextension=pdf" property within the pluginst.inf file, which only allows one extension.
Yes you can associate multiple extension to a packer plugin (in pluginst.inf just separate extension by space for default). You will have multiple entries !.
About using only pg-down instead of enter, you have a flag next the extension 256. You can define a "fake" extension and still make it works for "archive file". Give a look to multiarc to see how it works... latest binaries Your plugin must accept the file and implement function CanYouHandleThisFile
moisescastellano wrote: 2022-02-06, 19:46 UTC 2.- ThousandTypes plugin can parse and display MSOffice documents (eg docx, xlsx); however, since these files are in fact .zip files,
even if the user manually associates these extensions to the plugin, ctrl+pgDown will bring them in as zip files showing the internal structure,
instead of letting the plugin manage them and display the contents.
Is there a way to change this behavior, and allow the plugin to manage these files?
You have an order the plugins and the first that detect/accept the file seems to be taken.
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new: The Thousand Types plugin

Post by *moisescastellano »

Hi!
nsp wrote: 2022-02-07, 06:40 UTC Yes you can associate multiple extension to a packer plugin (in pluginst.inf just separate extension by space for default). You will have multiple entries !.
Great! I checked this way and it worked!
nsp wrote: 2022-02-07, 06:40 UTC About using only pg-down instead of enter, you have a flag next the extension 256. You can define a "fake" extension and still make it works for "archive file". Give a look to multiarc to see how it works... latest binaries Your plugin must accept the file and implement function CanYouHandleThisFile
I dont understand the first sentence. The plugin is already using pg-down: when the user "enters" the file via pg-down, TC opens the file as if it were an archive and shows the (tika-parsed) contents. There is no need for fake extensions: .pdf or .doc are just associated to the plugin; user can still open the files with the usual application (Acrobat, Word, etc.) with the Enter key. I didnt need to implement CanYouHandleThisFile function.
nsp wrote: 2022-02-07, 06:40 UTCYou have an order the plugins and the first that detect/accept the file seems to be taken.
The problem is that extensions of MSOffice such as .docx are detected as .zip files by TC core app (not a plugin), and it shows the internal structure of this zip file, which is not interesting to the user. So even if ThousandTypes is the first plugin, it is not called by TC to show the contents of the .docx

Thanks!
User avatar
nsp
Power Member
Power Member
Posts: 1687
Joined: 2005-12-04, 08:39 UTC
Location: Lyon (FRANCE)
Contact:

Re: new: The Thousand Types plugin

Post by *nsp »

moisescastellano wrote: 2022-02-07, 18:22 UTC Hi!
nsp wrote: 2022-02-07, 06:40 UTC Yes you can associate multiple extension to a packer plugin (in pluginst.inf just separate extension by space for default). You will have multiple entries !.


Great! I checked this way and it worked!
nsp wrote: 2022-02-07, 06:40 UTC About using only pg-down instead of enter, you have a flag next the extension 256. You can define a "fake" extension and still make it works for "archive file". Give a look to multiarc to see how it works... latest binaries Your plugin must accept the file and implement function CanYouHandleThisFile
I dont understand the first sentence. The plugin is already using pg-down: when the user "enters" the file via pg-down, TC opens the file as if it were an archive and shows the (tika-parsed) contents. There is no need for fake extensions: .pdf or .doc are just associated to the plugin; user can still open the files with the usual application (Acrobat, Word, etc.) with the Enter key. I didnt need to implement CanYouHandleThisFile function.
nsp wrote: 2022-02-07, 06:40 UTCYou have an order the plugins and the first that detect/accept the file seems to be taken.
The problem is that extensions of MSOffice such as .docx are detected as .zip files by TC core app (not a plugin), and it shows the internal structure of this zip file, which is not interesting to the user. So even if ThousandTypes is the first plugin, it is not called by TC to show the contents of the .docx

Thanks!
To override internal ZIP, you have to use in %commander_ini% the property:

Code: Select all

PluginOverrideZip=1
[IIRC]Implementing CanYouHandleThisFile allows you to handle file when using [ctrl]+PgDown without registering the extensions.
@Ghisler(Author) should give some details about call flow.
moisescastellano
Junior Member
Junior Member
Posts: 36
Joined: 2021-12-05, 19:11 UTC

Re: new: The Thousand Types plugin

Post by *moisescastellano »

nsp wrote: 2022-02-08, 07:13 UTC To override internal ZIP, you have to use in %commander_ini% the property:

Code: Select all

PluginOverrideZip=1
Wow! - yes, it works, inserting that in the [Packer] section of the wincmd.ini file, as I found googling PluginOverrideZip - in spite no doc, just an ancient plugin appeared on that search, how did you know? Thanks!
nsp wrote: 2022-02-08, 07:13 UTC [IIRC]Implementing CanYouHandleThisFile allows you to handle file when using [ctrl]+PgDown without registering the extensions.
@Ghisler(Author) should give some details about call flow.
Oh, ok - by now its ok to register some extensions by default and let the user register others in case he wants.
Thanks again - Best regards
Post Reply