Using NTFS MFT instead of FindFirst / Findnext - feasible?

Post by *Hacker » 2014-01-31, 10:27 UTC

Hi Christian and *.*,
Just as an exercise in imagination, would it bring any advantage (especially speed) to use the NTFS MFT for displaying directory contents (ie files) on NTFS drives instead of running FindFirst / FindNext each time TC enters a dir?
Using the NTFS MFT would also be great for Search but that is another topic.

TIA
Roman

MVV · Post by *MVV » 2014-01-31, 11:28 UTC

1. It requires admin rights.
2. AFAIK it contains only files, w/o human-readable folder structure.
3. We need to reread MFT in case of volume changes.
4. I think FindFirstFile does it itself.

It may be made as search module that will enumerate folder contents and return to TC information about files (so TC will use given lists instead of enumerating itself and check items for plugin fields etc), but it is not a good idea to use it always.

Post by *Hacker » 2014-01-31, 17:37 UTC

MVV,
Well, Everything manages to do 2. and 3., at least while it is running (or its service). As for admin rights, is that a problem? TC can always fall back to FindFirst/Next.

Roman

MVV · Post by *MVV » 2014-01-31, 17:40 UTC

Hacker wrote:As for admin rights, is that a problem? TC can always fall back to FindFirst/Next.

Yes, it is a problem. I would use fast search feature but I won't use elevated TC for this feature, it isn't correct approach.

Post by *Hacker » 2014-01-31, 18:28 UTC

MVV,
Well, you don't have to, I would. Does not make it wrong. Also an intermediary program could be used, just like for deleting files "As Admin".

Roman

Lefteous · Post by *Lefteous » 2014-01-31, 20:09 UTC

2Hacker
Isn't rereading MFT on volume changes a performance issue?

Post by *Hacker » 2014-01-31, 20:59 UTC

Lefteous,
I don't know, is it? Subjectively I feel no performance decrease when Everything is running in background.

Roman

Lefteous · Post by *Lefteous » 2014-02-01, 09:05 UTC

2Hacker
I don't know if Everything is actually rereading the MFT all the time. I thought it doesn't. If Everything is doing a scan only once and then watches for changes the speed gain mainly comes from watching for changes instead of MFT scanning. So if watching for changes doesn't slow down the system this is the way to go. Initial reading would be secondary.

Back to my point. Assuming that reading the MFT for the first time takes 4 seconds. How long does it take to reread it? Is there a way to do only a partial MFT scan? Many questions.

meisl · Post by *meisl » 2014-02-01, 19:38 UTC

To be clear: I'm not at all an expert wrt Windows API nor NTFS.

Having said that, I'm just wondering: wouldn't scanning and interpreting the MFT just be re-implementing FindFirst and FindNext, more or less

---

I mean the MFT structure is most probably designed to be space-efficient (I'm NOT saying that it's only optimized for space!).
Hence I figure it's necessary/reasonable to build higher-level structures from it - now in mem - that abstract from the low-level format, and provide a higher-level API.
But then - isn't that just what the OS does?

---

@Hacker: not at all that I wouldn't like your idea, or thought it couldn't lead anywhere. But maybe you could expand a bit more on what exactly you think could be saved by "doing it by hand"?
(I'm really interested but unfortunately rather ignorant, so I need a bit of explanation

)

EDIT: and I don't understand "Everything"?

milo1012 · Post by *milo1012 » 2014-02-01, 20:56 UTC

@meisl
For Everything see here and here.

I think another major problem would be: Distinguish FAT (also the new exFAT), Network, remote and other file systems from NTFS volumes.
You'd have two search algorithms/approaches built in to TC in that case...probably not the ideal way.
Also storing the index at a secure place (memory/ TC dir...?) is another issue, just like updating it dynamically will cause a lot of background load, which TC now doesn't have at all.
But maybe we can finally get rid of these legacy treeinfo.wc files that way, at least for NTFS.

Second, the source isn't available, at least not in a ready-to-use library from what I can see (Everything is also closed source).
According to Wikipepedia there is NTFS-Search and SwiftSearch, but it doesn't look mature to me and needs to be converted to Delphi.
Most other low-level access tools are quite concealed in terms of algorithms...everything would need to be implemented from the scratch and would be a lot of work.

Anyway, if it'd still be available some day, I'd also be happy to have it as an optionally search engine for TC.

Post by *Hacker » 2014-02-01, 21:41 UTC

Lefteous,
I agree about watching for changes and keeping an internal database being a big part of the speedup.

meisl,
Well, entering a dir of about 16,000 files on my computer / HDD takes 12 seconds. Searching within Everything is almost (impreceptibly) instantaneous. Thus I was wondering why not use whatever Everything does to speed up reading the file list of a dir.

milo1012,

I think another major problem would be: Distinguish FAT (also the new exFAT), Network, remote and other file systems from NTFS volumes.

If it is not easy to determine if a volume is a local NTFS one then in the worst case let the user specify it (though I sincerely doubt it would be that difficult).

You'd have two search algorithms/approaches built in to TC in that case...probably not the ideal way.

Well, one is already there, adding another is the suggestion. We have four internal packers, five hashes, four methods of zip encryption and four packer interfaces; I don't think two file list reading algorithms are overkill.

updating it dynamically will cause a lot of background load, which TC now doesn't have at all.

The load is not noticable with Everything. Why should things be worse with TC? Will it be worse than waiting 12 seconds to enter a dir? And if someone prefers the old approach, he can certainly keep using it.

Of course, there is also the question of how to best implement it with eg. colors by file type or the ignore list but I think the basic concept is already there with the way it's currently done.

Roman

meisl · Post by *meisl » 2014-02-01, 21:48 UTC

Ah, finally I'm getting your point: Everything seems to do so much better, so...

Well, now how to go about it? Is there at least some reasonable, official documentation of the MFT structure?

@milo1012: do you think that looking through the two sourceforge projects you linked to would at least give some clue about what Windows is allegedly doing "so awfully wrong"?

Lefteous · Post by *Lefteous » 2014-02-02, 10:20 UTC

2Hacker
Have you tried to find out if FindFirst/FindNext is really the problem? There have been other observations in the past that could explain your slow listing experience like listbox population speed.

milo1012 · Post by *milo1012 » 2014-02-02, 13:44 UTC

Hacker wrote:If it is not easy to determine if a volume is a local NTFS one then in the worst case let the user specify it (though I sincerely doubt it would be that difficult).

It's not just that. You'd need a lot of options that need to be set with such an approach.
-how to treat newly mounted/added disks (USB etc.) - include them immediate? or: remember the index for drives that were un-/replugged or after reboot etc.
Such options don't need to exist now.

Hacker wrote:Well, one is already there, adding another is the suggestion. We have four internal packers, five hashes, four methods of zip encryption and four packer interfaces; I don't think two file list reading algorithms are overkill.

Sure, but these things don't bother each other, but for the search there's probably a lot of work involved when it comes to combine both engines to create a result list when searching a mixed file system set.
But I agree that it's quite solvable.

Another thing I experienced with Everything: the standard search is instant, but sorting is still as slow as usual.
So when it comes to searching with narrowing options (time stamp, file attributes) Everything seems to get the attributes from API calls.

Hacker wrote: but I think the basic concept is already there with the way it's currently done.

I agree.
Btw, there's a Everything SDK available which involves a DLL.
I'm not sure about license stuff but it seems that TC could easily use the engine and do the necessary calls.
Maybe this could also be done with a plugin. (which of course would also require admin rights)

meisl wrote:the two Sourceforge projects you linked to would at least give some clue about what Windows is allegedly doing "so awfully wrong"?

Who says that Windows does sth. wrong? I'm probably not fully qualified for a decent analysis, but from what I can see it's just fast when it comes to searching file names and mapping it to the MFT index.
Things will probably look very different when searching for certain file attributes, time stamps (and of course content).

Anyway, it would still be great to have it. Maybe one of the new features for the next major TC release

.
Search is an integral part of a file manager, so it would fit quite well, even if it'd be optional.

Post by *Hacker » 2014-02-02, 17:45 UTC

Lefteous,

Have you tried to find out if FindFirst/FindNext is really the problem? There have been other observations in the past that could explain your slow listing experience like listbox population speed.

No, I have not. How would I go about that? Reentering the dir after it has been read in once (cached) takes less than one second.

milo1012,

for the search there's probably a lot of work involved when it comes to combine both engines to create a result list when searching a mixed file system set

But, why? First step is to fill in the file list, second step is to perform an operation upon it (search, assign colors to files, etc.). The first step checks if the disk is in cache, if yes, use the cache, if not, check if it is NTFS or not, and then use MFT or FindFirst as appropriate.

how to treat newly mounted/added disks (USB etc.) - include them immediate?

That's user-configurable in Everything.

Another thing I experienced with Everything: the standard search is instant, but sorting is still as slow as usual.
So when it comes to searching with narrowing options (time stamp, file attributes) Everything seems to get the attributes from API calls.

The nice thing is that TC could cache any attributes necessary. For instance the fields from Custom column views. Of course, this is another bag of questions but still solvable, not technically difficult, it's just that decisions need to be made.

Roman