Using NTFS MFT instead of FindFirst / Findnext - feasible?
Moderators: Hacker, petermad, Stefan2, white
Using NTFS MFT instead of FindFirst / Findnext - feasible?
Hi Christian and *.*,
Just as an exercise in imagination, would it bring any advantage (especially speed) to use the NTFS MFT for displaying directory contents (ie files) on NTFS drives instead of running FindFirst / FindNext each time TC enters a dir?
Using the NTFS MFT would also be great for Search but that is another topic.
TIA
Roman
Just as an exercise in imagination, would it bring any advantage (especially speed) to use the NTFS MFT for displaying directory contents (ie files) on NTFS drives instead of running FindFirst / FindNext each time TC enters a dir?
Using the NTFS MFT would also be great for Search but that is another topic.
TIA
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
1. It requires admin rights.
2. AFAIK it contains only files, w/o human-readable folder structure.
3. We need to reread MFT in case of volume changes.
4. I think FindFirstFile does it itself.
It may be made as search module that will enumerate folder contents and return to TC information about files (so TC will use given lists instead of enumerating itself and check items for plugin fields etc), but it is not a good idea to use it always.
2. AFAIK it contains only files, w/o human-readable folder structure.
3. We need to reread MFT in case of volume changes.
4. I think FindFirstFile does it itself.

It may be made as search module that will enumerate folder contents and return to TC information about files (so TC will use given lists instead of enumerating itself and check items for plugin fields etc), but it is not a good idea to use it always.
MVV,
Well, Everything manages to do 2. and 3., at least while it is running (or its service). As for admin rights, is that a problem? TC can always fall back to FindFirst/Next.
Roman
Well, Everything manages to do 2. and 3., at least while it is running (or its service). As for admin rights, is that a problem? TC can always fall back to FindFirst/Next.
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
MVV,
Well, you don't have to, I would. Does not make it wrong. Also an intermediary program could be used, just like for deleting files "As Admin".
Roman
Well, you don't have to, I would. Does not make it wrong. Also an intermediary program could be used, just like for deleting files "As Admin".
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
2Hacker
I don't know if Everything is actually rereading the MFT all the time. I thought it doesn't. If Everything is doing a scan only once and then watches for changes the speed gain mainly comes from watching for changes instead of MFT scanning. So if watching for changes doesn't slow down the system this is the way to go. Initial reading would be secondary.
Back to my point. Assuming that reading the MFT for the first time takes 4 seconds. How long does it take to reread it? Is there a way to do only a partial MFT scan? Many questions.
I don't know if Everything is actually rereading the MFT all the time. I thought it doesn't. If Everything is doing a scan only once and then watches for changes the speed gain mainly comes from watching for changes instead of MFT scanning. So if watching for changes doesn't slow down the system this is the way to go. Initial reading would be secondary.
Back to my point. Assuming that reading the MFT for the first time takes 4 seconds. How long does it take to reread it? Is there a way to do only a partial MFT scan? Many questions.
To be clear: I'm not at all an expert wrt Windows API nor NTFS.
Having said that, I'm just wondering: wouldn't scanning and interpreting the MFT just be re-implementing FindFirst and FindNext, more or less
---
I mean the MFT structure is most probably designed to be space-efficient (I'm NOT saying that it's only optimized for space!).
Hence I figure it's necessary/reasonable to build higher-level structures from it - now in mem - that abstract from the low-level format, and provide a higher-level API.
But then - isn't that just what the OS does?
---
@Hacker: not at all that I wouldn't like your idea, or thought it couldn't lead anywhere. But maybe you could expand a bit more on what exactly you think could be saved by "doing it by hand"?
(I'm really interested but unfortunately rather ignorant, so I need a bit of explanation
)
EDIT: and I don't understand "Everything"?
Having said that, I'm just wondering: wouldn't scanning and interpreting the MFT just be re-implementing FindFirst and FindNext, more or less

---
I mean the MFT structure is most probably designed to be space-efficient (I'm NOT saying that it's only optimized for space!).
Hence I figure it's necessary/reasonable to build higher-level structures from it - now in mem - that abstract from the low-level format, and provide a higher-level API.
But then - isn't that just what the OS does?
---
@Hacker: not at all that I wouldn't like your idea, or thought it couldn't lead anywhere. But maybe you could expand a bit more on what exactly you think could be saved by "doing it by hand"?
(I'm really interested but unfortunately rather ignorant, so I need a bit of explanation

EDIT: and I don't understand "Everything"?
@meisl
For Everything see here and here.
I think another major problem would be: Distinguish FAT (also the new exFAT), Network, remote and other file systems from NTFS volumes.
You'd have two search algorithms/approaches built in to TC in that case...probably not the ideal way.
Also storing the index at a secure place (memory/ TC dir...?) is another issue, just like updating it dynamically will cause a lot of background load, which TC now doesn't have at all.
But maybe we can finally get rid of these legacy treeinfo.wc files that way, at least for NTFS.
Second, the source isn't available, at least not in a ready-to-use library from what I can see (Everything is also closed source).
According to Wikipepedia there is NTFS-Search and SwiftSearch, but it doesn't look mature to me and needs to be converted to Delphi.
Most other low-level access tools are quite concealed in terms of algorithms...everything would need to be implemented from the scratch and would be a lot of work.
Anyway, if it'd still be available some day, I'd also be happy to have it as an optionally search engine for TC.
For Everything see here and here.
I think another major problem would be: Distinguish FAT (also the new exFAT), Network, remote and other file systems from NTFS volumes.
You'd have two search algorithms/approaches built in to TC in that case...probably not the ideal way.
Also storing the index at a secure place (memory/ TC dir...?) is another issue, just like updating it dynamically will cause a lot of background load, which TC now doesn't have at all.
But maybe we can finally get rid of these legacy treeinfo.wc files that way, at least for NTFS.
Second, the source isn't available, at least not in a ready-to-use library from what I can see (Everything is also closed source).
According to Wikipepedia there is NTFS-Search and SwiftSearch, but it doesn't look mature to me and needs to be converted to Delphi.
Most other low-level access tools are quite concealed in terms of algorithms...everything would need to be implemented from the scratch and would be a lot of work.
Anyway, if it'd still be available some day, I'd also be happy to have it as an optionally search engine for TC.
Lefteous,
I agree about watching for changes and keeping an internal database being a big part of the speedup.
meisl,
Well, entering a dir of about 16,000 files on my computer / HDD takes 12 seconds. Searching within Everything is almost (impreceptibly) instantaneous. Thus I was wondering why not use whatever Everything does to speed up reading the file list of a dir.
milo1012,
Of course, there is also the question of how to best implement it with eg. colors by file type or the ignore list but I think the basic concept is already there with the way it's currently done.
Roman
I agree about watching for changes and keeping an internal database being a big part of the speedup.
meisl,
Well, entering a dir of about 16,000 files on my computer / HDD takes 12 seconds. Searching within Everything is almost (impreceptibly) instantaneous. Thus I was wondering why not use whatever Everything does to speed up reading the file list of a dir.
milo1012,
If it is not easy to determine if a volume is a local NTFS one then in the worst case let the user specify it (though I sincerely doubt it would be that difficult).I think another major problem would be: Distinguish FAT (also the new exFAT), Network, remote and other file systems from NTFS volumes.
Well, one is already there, adding another is the suggestion. We have four internal packers, five hashes, four methods of zip encryption and four packer interfaces; I don't think two file list reading algorithms are overkill.You'd have two search algorithms/approaches built in to TC in that case...probably not the ideal way.
The load is not noticable with Everything. Why should things be worse with TC? Will it be worse than waiting 12 seconds to enter a dir? And if someone prefers the old approach, he can certainly keep using it.updating it dynamically will cause a lot of background load, which TC now doesn't have at all.
Of course, there is also the question of how to best implement it with eg. colors by file type or the ignore list but I think the basic concept is already there with the way it's currently done.
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
Ah, finally I'm getting your point: Everything seems to do so much better, so...
Well, now how to go about it? Is there at least some reasonable, official documentation of the MFT structure?
@milo1012: do you think that looking through the two sourceforge projects you linked to would at least give some clue about what Windows is allegedly doing "so awfully wrong"?
Well, now how to go about it? Is there at least some reasonable, official documentation of the MFT structure?
@milo1012: do you think that looking through the two sourceforge projects you linked to would at least give some clue about what Windows is allegedly doing "so awfully wrong"?
It's not just that. You'd need a lot of options that need to be set with such an approach.Hacker wrote:If it is not easy to determine if a volume is a local NTFS one then in the worst case let the user specify it (though I sincerely doubt it would be that difficult).
-how to treat newly mounted/added disks (USB etc.) - include them immediate? or: remember the index for drives that were un-/replugged or after reboot etc.
Such options don't need to exist now.
Sure, but these things don't bother each other, but for the search there's probably a lot of work involved when it comes to combine both engines to create a result list when searching a mixed file system set.Hacker wrote:Well, one is already there, adding another is the suggestion. We have four internal packers, five hashes, four methods of zip encryption and four packer interfaces; I don't think two file list reading algorithms are overkill.
But I agree that it's quite solvable.
Another thing I experienced with Everything: the standard search is instant, but sorting is still as slow as usual.
So when it comes to searching with narrowing options (time stamp, file attributes) Everything seems to get the attributes from API calls.
I agree.Hacker wrote: but I think the basic concept is already there with the way it's currently done.
Btw, there's a Everything SDK available which involves a DLL.
I'm not sure about license stuff but it seems that TC could easily use the engine and do the necessary calls.
Maybe this could also be done with a plugin. (which of course would also require admin rights)
Who says that Windows does sth. wrong? I'm probably not fully qualified for a decent analysis, but from what I can see it's just fast when it comes to searching file names and mapping it to the MFT index.meisl wrote:the two Sourceforge projects you linked to would at least give some clue about what Windows is allegedly doing "so awfully wrong"?
Things will probably look very different when searching for certain file attributes, time stamps (and of course content).
Anyway, it would still be great to have it. Maybe one of the new features for the next major TC release

Search is an integral part of a file manager, so it would fit quite well, even if it'd be optional.
Lefteous,
milo1012,
Roman
No, I have not. How would I go about that? Reentering the dir after it has been read in once (cached) takes less than one second.Have you tried to find out if FindFirst/FindNext is really the problem? There have been other observations in the past that could explain your slow listing experience like listbox population speed.
milo1012,
But, why? First step is to fill in the file list, second step is to perform an operation upon it (search, assign colors to files, etc.). The first step checks if the disk is in cache, if yes, use the cache, if not, check if it is NTFS or not, and then use MFT or FindFirst as appropriate.for the search there's probably a lot of work involved when it comes to combine both engines to create a result list when searching a mixed file system set
That's user-configurable in Everything.how to treat newly mounted/added disks (USB etc.) - include them immediate?
The nice thing is that TC could cache any attributes necessary. For instance the fields from Custom column views. Of course, this is another bag of questions but still solvable, not technically difficult, it's just that decisions need to be made.Another thing I experienced with Everything: the standard search is instant, but sorting is still as slow as usual.
So when it comes to searching with narrowing options (time stamp, file attributes) Everything seems to get the attributes from API calls.
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.