Let's make writing Java plugins fun!

meisl · Post by *meisl » 2014-01-08, 15:04 UTC

Cool, thank you

So I guess it's getting time to clean up the mess I created and make it actually possible to play around with it.
I'll give notice here when done.

I'm quite unsure as to how to organize it atm. I wrote to Ken Händel but haven't got back an answer yet.
My example plugin NtfsStreamsJ is not really suited for a hands-on example so I think I will write another one and put this, still together with my "API", as a new one on github.

@Lefteous, as meisl says, even if other languages are more suited for this, lots of people might not know them (I was very proficient in c/c++ years ago, not anymore), and know java, or want to use some java library that is not available in other languages for example.

Just found "Writing plugins with .NET?" where

ZoSTeR wrote:[...]but I guess the concepts are just too different (data types etc) to write something without considerable effort.

So...

EDIT: and there's decWDX Delphi library, "Delphi library to simplify WDX plugins writing", 1600+ downloads

meisl · Post by *meisl » 2014-01-11, 00:23 UTC

Regarding performance:
More important than "being close to the iron" are IMHO clever strategies/algorithms.

Particularly for TC plugins, there are certain execution patterns that happen so frequently and also reliably such that not exploiting their characteristics comes near committing a sin!
However, there are quite a few of these patterns and they can be complex. Additionally, they differ by plugin-type and of course their being exploitable depends on the actual purpose of the plugin.
We certainly do not want to re-implement such strategies over and over for every plugin if, by abstraction, it could be avoided.

So this is another aspect which I envision to be improved by an extended API.
It's of course above and beyond the rather simple adapation of making things "more Java/JVM-like" and as such should be well-separated, and probably approached only after the adaption.

----

To give an example:
Checksum, say MD5, for each file; in a WDX (content plugin). Imagine further that this checksum would be written to a special ADS, together with the lastModified DateTime of the file when it was being hashed.
This would be useful eg for lightning-fast duplicates-finding or creating checksum files for verifying a proper DVD-burn.
Since checksumming is really expensive, strategies are called for - even if the actual hashing were implemented in assembler.
An obvious one is for example to compare the lastModified value in the ADS vs the actual one and not to re-do the calculation if it's not expired. And there's more.
Anyways, you should already see that there's plenty of what an API could offer to you if you were faced with this task:
- caching in general
- letting you specify when and what to cache. This is twofold: a) a yes/no criterion when to (re-)calculate and b) a measure of the actual cost of a (re-)calculation (here clearly: the file's size)
- managing when to tell TC "delay" (or even "only-on-demand"), based on how much work (see above, cost of re-calculation) is already "in-the-queue" (yes, your plugin has to be reentrant - which is more than just thread-safe) EDIT: although not obvious this is indeed true, at least wrt to instance fields. So the API should somehow encourage an appropriate style.
- I can even imagine automatic and dynamic heuristics, to a degree

----

What d'ya think?

meisl · Post by *meisl » 2014-01-13, 21:18 UTC

Next API improvement: support delaying expensive tasks simply by implementing `isDelayInOrder` in your subclass of ContentPlugin.Field like so:

Code: Select all

define(new Field.STRING("MD5") {
    public boolean isDelayInOrder(String fileName) throws IOException {
        File file = new File(fileName);
        return !file.isDirectory() || file.length() < 1024 * 1024; // for folders it's empty anyways, hence NOT expensive in that case
    }
    public String getValue(String fileName) throws IOException {
        //....
    }
});

(see below for comparison with what you had to write yourself otherwise)

Similarly we could handle FT_ONDEMAND which is used to indicate really, really expensive operations. But maybe there's an even niftier way of integerating the two: something like having the implementation say where its bottle-necks are (here: IO, particularly reading, for example SSD vs DVD) plus some hint on how to build heuristics (here: exec time depends linearly on how fast the file can be read from start to end, again linearly).
This way the API could dynamically adapt and decide what to tell TC, and when. In order to benefit, the implementor would have to be more declarative about the actual algorithm implemented - but note: this might even result in less code!

Anyways, strongly related is `contentStopGetValue` which is invoked by TC on another thread in order to (try to) abort a currently running expensive operation.
Obviously an implementor should NOT be bothered with synchronization stuff as far as possible.
Now the good news: provided you're using `java.nio.channels` for IO - abortion will be supported automatically, without any further effort!

----

Code from the current impl. of super-class `ContentPlugin`, for the sake of illustrating of what one is freed from doing by hand:

Code: Select all

private Thread workingThread;

public final int contentGetValue(String fileName,
                                    int fieldIndex,
                                    int unitIndex,
                                    FieldValue fieldValue,
                                    int maxlen,
                                    int flags)
    {
    if (fieldIndex >= fields.size()) {
        return FT_NOSUCHFIELD;
    }
    Field<?> field = fields.get(fieldIndex);
    try {
        if (field.isDelayInOrder(fileName)) {
            if ((flags & CONTENT_DELAYIFSLOW) != 0) {
                return FT_DELAYED;
            }
            synchronized(this) { // TODO: prepare for more than 2 concurrent threads, don't rely on particular calling order of TC
                workingThread = Thread.currentThread();
            }
        }
        Object value = field._getValue(fileName); // TODO: support units
        synchronized(this) {
            if (workingThread != null) {
                workingThread.interrupted(); // clear interrupted status
                workingThread = null;
            }
        }
        // TODO: check result vs maxlen
        fieldValue.setValue(field.type, value);
        return field.type;
    } catch (IOException e) {
        if (workingThread != null) {
            workingThread.interrupted(); // clear interrupted status
            workingThread = null;
            return FT_FIELDEMPTY;
        }
        myLog.error(e);
        return FT_FILEERROR;
    }
}

public final void contentStopGetValue(String fileName) { // assumed to be called on (the only) other thread, and *only* once per contentGetValue having returned FT_DELAYED *and* then having been called again with (flags & CONTENT_DELAYIFSLOW) == 0
    synchronized(this) {
        workingThread.interrupt(); // TODO: prep for more than 2 concurrent threads, invalid state checking, etc
    }
}

Plz note that the above is ONLY to give an impression. It is definitely NOT anywhere near production quality.
It makes a lot of (silent) assumptions (which happen to be met under circumstances).
It is, in addition, NOT thoroughly tested. So it does work ... under circumstances

This disclaimer, however, should only add to my actual point: show how much potential there is for improving the implementor's XP.

meisl · Post by *meisl » 2014-01-14, 02:20 UTC

jmwap,
you mentioned that you have an idea for an example project.

Maybe you could shortly describe your idea, either here or / and (ideally) as a reply to this issue on github. (plz don't bother the current name of the repo, I'm planning to turn it into the real "tc_java 2.0" later).
That'd be really awesome!

Whether it's (allegedly) very niche or not I don't care. I'll gratefully take any hint about how to make it easy for ppl to get "hands-on" / start playing.

Thanks,
meisl

jmwap · Post by *jmwap » 2014-01-14, 11:29 UTC

meisl wrote:jmwap,
you mentioned that you have an idea for an example project.

Maybe you could shortly describe your idea, either here or / and (ideally) as a reply to this issue on github. (plz don't bother the current name of the repo, I'm planning to turn it into the real "tc_java 2.0" later).
That'd be really awesome!

Whether it's (allegedly) very niche or not I don't care. I'll gratefully take any hint about how to make it easy for ppl to get "hands-on" / start playing.

Thanks,
meisl

I have an idea of 'something' I could built if I wanted to build some plugin (even if the plugin was an overkill for what it did), meaning that I know what it would 'sort of' do. But never thought of the details at all. Not sure if it would be a content, lister etc.

I am really struggling now with time so I don't get too optimistic about me trying this any time soon, sorry

meisl · Post by *meisl » 2014-01-14, 21:05 UTC

Hey jmwap, thanks for your reply!

Sure, I understand that you don't have the time to build it, or even start trying. That's perfectly ok, even more so given the very early and premature stage my stuff is in.

All I'm asking for are some hints on what it could possibly be, some description of a use-case, however vague it might be.

With that I'm hoping to be able to make it actually feasible - or even worth - trying; for someone...
You see, I'm desparately looking for feedback, and even a vague idea of what you have in mind would certainly help me

.

----

Ok, here's something I have implemented today, and maybe it helps to keep you interested:

Reading the file contents from start to end should be something that's needed rather commonly for a content plugin.
The API now let's you iterate over a sequence of `ByteBuffer`s, like so:

Code: Select all

for (ByteBuffer buf: contents(fileName)) {
    md5.update(buf);
}

Apart from being, well, just as nice as it can get IMHO

,
- it automatically supports abortion/cancellation (see previous posts)
- moves decisions about buffer sizes etc to the API, hence allows for optimizations being done there (and of course more concise client code)
- and - that's really cool - when I changed it to use `AsynchronousFileChannel`- out came ~15% speedup, compared to TC's built-in "Create checksum"!
(using nothing but java's std `java.security.MessageDigest.getInstance("MD5")`)

Re the last point: the only trick employed so far was to kick off reading the next chunk asynchronously just before the current one is handed over to client code (MD5 checksum here).
This alone is rather complex, given that all the contracts re cancellation (`stopGetValue` called on another thread), proper closing of the channel in every case and proper error handling are to be met.
You certainly don't want to do this "by hand", and definitely don't want to do it more than once (ie don't wanna do it again for every particular plugin).

But just look at how simple the client-side code is!

There's even more potential for improving perf (recall that we're already above built-in TC code!), like dynamic heuristics for adapting to optimal buffer size (which is indeed crucial, and strongly depends on the medium you're reading from).
And none of these envisioned clever optimizations would require any change in client code whatsoever!

EDIT: to be clear, I'm talking of real code. Have a look on github, eg commits add ContentPlugin.contents() to iterate over chunks of data from a file or use AsynchronousFileChannel in ContentPlugin.contents(fileName).

Lefteous · Post by *Lefteous » 2014-01-14, 22:17 UTC

2meisl
I'm not really deep into Java threading so excuse me if my analysis is wrong.
I'm not really sure if your implementation of the stop function is really how it should work. The function delivers a filename to the plugin for which the calculation should be stopped. This should be considered.
In addition it seems you actually interrupt a TC thread. If yes I think this should be avoided.

meisl · Post by *meisl » 2014-01-14, 23:48 UTC

You're quite hitting something, Lefteous. Thank you!

As I said, there's quite some silent assumptions in there, and it definitely needs improvement.

The point you spotted, ie the fact that the actual arg to `contentStopGetValue` (the fileName) is in no way explicitly connected to the thread I'm calling `.interrupt()` on is indeed as bad "hackery" as it can get...

It does, however, work because
- TC calls `contentStopGetValue` only after `contentGetValue` had returned FT_DELAYED and then again `contentGetValue` was called without being allowed to delay it,
- AND these three calls are all on the same fileName, in succession
- AND there won't be any other intermediate calls re different files in between
- AND TC performs these calls in no more than two threads.

Anyways, there's no question that this needs to be improved, even if TC's calling pattern is not very likely to change soon.
You might have noticed all the "TODO" and "assuming..." comments in the code I posted above...

Lefteous wrote:In addition it seems you actually interrupt a TC thread. If yes I think this should be avoided.

Yep, that's an interesting point which I don't fully understand myself.

First off, in java, `interrupt`ing a thread is not so big a deal. It's basically nothing but atomically setting a boolean variable to true which is local to thread.
It's totally up to the `interrupt`ed thread itself (well, except...*) to
a) test this variable every now and then (and the channels do)
b) how to react on a change (channels close themselves and throw an `AsynchronousCloseException` or `ClosedByInterruptException`)
c) whether to clear this flag or not (channels DON'T)

So, `interrupt`ing a java thread is not at all like a `suspend` or even `destroy`; rather like a polite request, which may very well go unheared altogether.

Next, all exceptions that come from either an IO error or from an interruption are caught in my `contentGetValue` and then either FT_FILEERROR or FT_FIELDEMPTY is returned, respevtively.
That is to say: in these ("normal") cases, the thread just keeps on living happily.

Finally - and that's what I don't understand completely - even if some arbitrary uncaught exception is thrown, the only thing that happens is a dialog from tc_java showing up, stating that exception.
And TC seems to receive an FT_FILEERROR or FT_FIELDEMPTY, probably due to the magic of tc_java.
But apart from that, the thread does live on. This is for sure, I'm getting normal calls on the very same threads even after such events.
I haven't investigated so far how this actually works, but I will eventually.

So, at least for the moment it feels quite safe to just say "well, I'm simply never going to kill a TC thread - whatever I may do". Thanks to Ken, I suppose...

---

Thanks again, Lefteous, for your comment and the definitely attentative observations.
How to handle java's annoying exception system, and how to make the API fit it as best as possible is a topic of its own (the Iterable pattern with `contents(...)` is NOT optimal!).
And sorry if my answer was much longer than you might have wished. Just couldn't make it any shorter.

___
* "well, except...": not important here, but I'll try to explain if you want me to

milo1012 · Post by *milo1012 » 2014-01-15, 02:59 UTC

meisl wrote:- AND these three calls are all on the same fileName, in succession

No, you're forgetting about multiple content fields.
TC can call contentGetValue for the same file a hundred times when you request the same amount of fields.
Let's say you delay half of them with FT_DELAYED due to different field complexity or similar.
So now you have a mixture of a "foreground" thread, which should be served ASAP, and the background thread.
If now contentStopGetValue is called for that file, the foreground thread stopped until then of course, but the background thread
seems to be in a "Queue", which means that the remaining requests to contentGetValue are still fired, even after the obvious stop.

This gave me some headache, especially when writing PCREsearch and testing in big files.
You may take a look at my source (though it's CPP).

Maybe it's a bug, since it doesn't always happen and seems to be related to the amount of files in the current view...
Anyway, you should consider multiple fields for the same file.

meisl · Post by *meisl » 2014-01-15, 17:24 UTC

Thanks, milo1012

indeed I wasn't aware of the "Queue", although when I read the guide on contentStopGetValue I had kind of a nagging feeling.

Isn't it just too unspecific to only pass the fileName, isn't at least the field index and maybe also the unit index, if any, missing?

----- skip next 2 paragraphs if not interested in why my hack does happen to work -----

Or, assuming

wdx guide wrote:[ContentStopGetValue] will be called only while a call to ContentGetValue is active in a background thread.

and also that there are only two threads,
then this implies that contentStopGetValue is always called on the foreground thread.
But then there's only one candidate for which cancellation is requested: whichever task is underway on the background thread.
So that means that not even the fileName arg is needed.
Well, and that's what I implemented.

The only problem left with this is that possibly the lengthy op on file X already finished AND the next lengthy op on file Y just started (both on bg thread) BEFORE the call to contentStopGetValue(X) on the fg thread arrives.
In this case I would (prematurely) stop the op for Y.
However, if then comes contentStopGetValue(Y) then either lengthy op for Z is stopped, if any - or, if there is no background work going on anymore, nothing happens.
As a net effect, there won't be any observable difference.

----- end skip -----

On the other hand, the problem with the "Queue" is that TC issues too few calls to contentStopGetValue, if I understand correctly.
Given that, one might be tempted to take contentStopGetValue(X) as to mean "stop any work on X".
However, the queue exists only in TC, and the plugin receives surplus calls to contentGetValue only after contentStopGetValue.
IMHO this needs to be fixed in TC, by not issuing the surplus calls to contentGetValue in the first place.

So what should a plugin do?
- Certainly NOT try to "anticipate" the queue. Not because it's hard, but because it'll most likely break once TC's behaviour is fixed!
- prepare for more than only 2 concurrent threads
- internally keep track of "work items", individual for every call to contentGetValue that it wishes to be interruptible
- then, when contentStopGetValue(X) arrives, find the set of "work items" associated with X and stop all of them or ignore if there are none

This seems to me the most reasonable thing to do, given the interface as it is.

I'm not sure I understand what you do in PCREsearch. The hash is on the file name only, not the field index, right?
Maybe you could explain your strategy?

meisl · Post by *meisl » 2014-01-15, 20:45 UTC

Have it like this now (shown in condensed form here):

Code: Select all

private Map<String, Set<WorkItem>> fileNames2workItems = new HashMap<>();

public final void contentStopGetValue(String fileName) {
    synchronized(fileNames2workItems) {
        Set<WorkItem> workItems = fileNames2workItems.get(fileName);
        if (workItems != null) {
            for (WorkItem workItem: workItems) {
                workItem.requestStop();
}   }   }   }

class WorkItem {
    private Thread workingThread;
    /* ... */
    public void requestStop() {
        this.workingThread.interrupt();
    }
    public void cleanup() {
        if (this.workingThread != Thread.currentThread()) { // TODO: assert it
            throw new IllegalStateException(this.workingThread + " != " + Thread.currentThread());
        }
        Thread.interrupted(); // clear interrupted status
        removeWorkItem(this);
}   }

with methods `addWorkItem` and `removeWorkItem` which you can imagine.
`contentGetValue` looks like so:

Code: Select all

public final int contentGetValue(...) {
    if (fieldIndex >= fields.size()) {
        return FT_NOSUCHFIELD;
    }
    Field<?> field = fields.get(fieldIndex);
    WorkItem workItem = null;
    try {
        if (field.isDelayInOrder(fileName)) {
            if ((flags & CONTENT_DELAYIFSLOW) != 0) {
                return FT_DELAYED;
            }
            workItem = addWorkItem(new WorkItem(...)); // constructor stores currentThread in field workingThread
        }
        Object value = field._getValue(fileName);
        if (workItem != null) {
            workItem.cleanup();
        }
        fieldValue.setValue(field.type, value);
        return field.type;
    } catch (AsynchronousCloseException e) {    // also catches ClosedByInterruptException
        workItem.cleanup();
        return FT_FIELDEMPTY;
    } catch (IOException e) {
        if (workItem != null) {
            workItem.cleanup();
        }
        return FT_FILEERROR;
    } catch (Throwable e) {
        if (workItem != null) {
            workItem.cleanup();
        }
        throw e;
    }
}

Could be more elegant; but would it meet your expectations, Lefteous and milo1012?

milo1012 · Post by *milo1012 » 2014-01-15, 21:36 UTC

meisl wrote:Isn't it just too unspecific to only pass the fileName, isn't at least the field index and maybe also the unit index, if any, missing?

Of course it would help to have them.
But if you check for the filename in a global, thread-safe pool (Semaphor- or Lock-like) every half second, like I did in PCREsearch,
this is already enough to signal a stop to all possible threads that use the corresponding file
(i.e. only the single background thread in the current implementation).
I think this should also be possible in Java.

meisl wrote:The hash is on the file name only, not the field index, right?
Maybe you could explain your strategy?

Right. I only use a hash to speed things up a little, but you could also use the complete name string.
I explained it above...you just check for a stop signal for your file from a synchronized pool...of course not too often, maybe once a second.
A field index is of no use here, and I don't get it from TC anyway.
There may be other strategies for this, but it works right now.

TC does this okay most of the time, but from what I can see there is a misbehavior when it comes to the Queue abort for some situations with a single file.
From my observation:

(I use Process Explorer to see which file handles are opened and closed, but you could also use a log file for this)
I use a single big file (2GiB+) in a directory, with three fields requested, all of them take at least a minute to finish.
All fields return ft_delayed of course.
Now you can see the background thread for the first field.
I change to an empty directory (or one with only small files where the fields are returned immediately)
-> the first field is canceled correctly due to ContentStopGetValue
But: the second field starts to work because TC didn't cancel the queue.
If I switch to another empty dir -> the second is canceled but the third field is requested
Same thing again for the last fields: if I switch to another dir -> the third is canceled
...and so on if I use a fourth field or more

When using a dir or view with more than one file, where most or all fields requested in background due to ft_delayed,
but maybe some are small and return immediately, things seem to work correctly for some reason:
the queue seems to cancel correctly.
But even there I get to situations where I still need more than one dir change to cancel all remaining fields.
I have no idea what triggers this sometimes.

As I said, I'm not sure if this is a bug, but the queue should be canceled from what I can see, no matter how many fields requested.

meisl wrote:So what should a plugin do?

Nothing, because you can't do anything besides canceling the active request, like you already explained.

I think Mr. Ghisler needs to make a statement about how things "should" work here and if our observations are right.
There are only a few content plug-ins which actually use ContentStopGetValue, so it wouldn't hurt to change that behavior from what I can see.

meisl · Post by *meisl » 2014-01-16, 00:24 UTC

milo1012 wrote:But if you check for the filename [..] this is already enough to signal a stop to all possible threads that use the corresponding file

I see, so we agree on the interpretation "stop any work on that file" (as opposed to eg the most recently started, or the least recently...).

milo1012 wrote:...(i.e. only the single background thread in the current implementation)

...which is the reason why, in my previous hack, I could collapse the "pool" to just one variable (`workingThread`) and could afford to ignore even the fileName. Definitely not acceptable, though.

...I think this should also be possible in Java.

Sure, it's what I (hope to) have implemented in the code posted above.
Maybe some words of explanation on this:
signalling the stop consists of `workItem.requestStop()`,
but you don't see the checking for this signal there since it's to be done in the client code (implementor of Field.getValue(..)). Even there it might not be explicit (which I consider a good thing(TM)), for example if you're using channels for IO, where it happens in the channel's code and yields an `AsynchronousCloseException`, in case.

Right. I only use a hash to speed things up a little, but you could also use the complete name string.
I explained it above...
A field index is of no use here, and I don't get it from TC anyway.

Ok, understand. I just got a bit confused because you said I "should consider multiple fields for the same file".
I thought you meant some way of differentiating by field in the code for cancellation - and couldn't see any...
As a side-note: I think by storing only the hashes you're trading the (unlikely) possibility of collisions for just some bytes of mem. You don't save any work, in terms of hash computations, as compared to my hash table. Also, are you sure you could handle (cancellation of) more than 1 concurrent bg op on the same file correctly? But that's really only a side-note.

---

Regarding your observations:
I didn't think of only 1 big file in a directory, but incidentally found that the (more?) precise condition for the misbehaviour seems to be this:
1) more than one delayed column
2) on exactly one (!) file in view (!)
Note that if 2) is not met - ie more than 1 file with "pending" columns - then TC does adhere to the protocol when you trigger an abortion, although in quite an awkward manner:
first it calls contentStopGetValue on the current expensive op (as expected)
but then re-requests all fields on all files with CONTENT_DELAYIFSLOW set (receiving several FT_DELAYED),
then, again, starts one in bg thread with CONTENT_DELAYIFSLOW off (forcing the computation)
and immediately afterwards calls contentStopGetValue for that file in fg thread.
It does not issue surplus contentGetValue with CONTENT_DELAYIFSLOW off in the bg thread (good)
- if only there's more than one file in view with delayed fields!

Btw: this even works with the same expensive column repeated in your custom column view.
TC seems NOT to be clever enough to request this value only once.

milo1012 wrote:But even there [more than 1 file, with some immediate fields] I get to situations where I still need more than one dir change to cancel all remaining fields.

At my wit's end... wasn't able to see it myself so far.

milo1012 wrote:I think Mr. Ghisler needs to make a statement about how things "should" work here and if our observations are right.

Definitely, PLEASE!

milo1012 wrote:There are only a few content plug-ins which actually use ContentStopGetValue, so it wouldn't hurt to change that behavior from what I can see.

Interesting. Didn't even dare to think, but now... it seems there are improvements that wouldn't even be "breaking changes".
I, for one, think it's rather important to work on this issue, because it does impact user experience in a non-negligible way.
So, @ghisler, you could be of great help here

milo1012 · Post by *milo1012 » 2014-01-16, 02:10 UTC

meisl wrote:I just got a bit confused because you said I "should consider multiple fields for the same file".

Because you said "AND these three calls are all on the same fileName, in succession", which is not the case with multiple fields from the same file,
we get ContentGetValue several times, so I thought you didn't think of fields (yet).

meisl wrote:As a side-note: I think by storing only the hashes you're trading the (unlikely) possibility of collisions for just some bytes of mem.

Do we really need to nitpick at these things?
I appreciate your comments, but I have experience and other reasons for this.
(later and former code use, usage in other projects)
Even if the hash fails, we're talking about a non-vital function which only aborts a read function early.
So it wouldn't hurt if this happens once in a lifetime or so due to the inconsiderable probability.

meisl wrote:At my wit's end... wasn't able to see it myself so far.

I am: three big files (2GiB+) in one dir, each with the mentioned three fields -> Queue remains like with a single big file.
I'm not sure if my code triggers this, but I wouldn't see how.

How did you test for the big file, with your own plugin?

meisl · Post by *meisl » 2014-01-16, 03:01 UTC

milo1012 wrote:Because you said "AND these three calls are all on the same fileName, in succession", which is not the case with...

You're right. To be precise, it is (would be) in fact more than necessary for my previous hack to work, which I realized only later.
Whatsoever, without it you would not have made me aware of the "Queue" issue, so Thank You again

milo1012 wrote:Do we really need to nitpick at these things?

Not at all, just a side-note.
But what I'm really puzzled with is the other thing: would it handle (cancellation of) more than 1 concurrent bg op on the same file correctly?
This is not about using hashes. It's that per file you seem to keep only 1 entry (there or not), rather than a set of started threads to signal a "stop" to.
I'm thinking that you could achieve an equivalent effect by associating a counter rather than a boolean to each (hash of) fileName.
It's really not meant to be nitpicking, I'm honestly interested and wondering what I'm getting wrong?

milo1012 wrote:I am: three big files (2GiB+) in one dir, each with the mentioned three fields -> Queue remains like with a single big file.
I'm not sure if my code triggers this, but I wouldn't see how.

How did you test for the big file, with your own plugin?

Hmm, I'll have to try more it seems.

I'm having ~200 files in the folder, 30% above 1GB and the rest a few KB each.
I'm calculating MD5 checksums for each and only for the big ones I return FT_DELAYED.
Additionally - and that might be it - once the checksum calculation has finished for a file I store it in an ADS (NTFS Alternate Data Stream) of that file,
together with the lastModified date, so when it's requested again, and up-to-date, I return it immediately.

Now I was trying to reproduce the flawed behaviour and the simplest way to get another expensive column was to just duplicate it in custom columns view.

At first this did NOT reproduce it but rather the awkward but still compliant re-request with immediate cancellation, as I described.

However, when writing the post I went back to re-check it - and oops - it had changed to the flawed behaviour!
Took me a while to realize that only one file had been left without up-to-date MD5 in the ADS...

Total Commander

Let's make writing Java plugins fun!

FT_DELAYED / FT_ONDEMAND / contentStopGetValue