OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

English support forum

Moderators: Hacker, petermad, Stefan2, white

User avatar
tuska
Power Member
Power Member
Posts: 4049
Joined: 2007-05-21, 12:17 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *tuska »

georgeb wrote: 2023-11-03, 16:23 UTC Yes, you are searching for files containing an adjacent combination of U+25CC, a dotted circle,
with a Trema on top - directly in "Everything".

That of course is possible - but what use is it for? :? :wink:
I thought you were looking for file names that contain this character: U+0308.
(Although, you could have copied this character and pasted it into the Everything search box).

georgeb wrote: 2023-11-03, 16:23 UTC Also it is not clear to me what the advantage of searching directly in "Everything" would be
when the search can equally be done from within TC which offers a much better GUI?
It was just an example...

The advantage is that you can use the search box of 'Everything' as a "playground" and the results are displayed in real time.
- Single results can be transferred to TC by mouse click (settings in Everything required).
- For multiple search results there is an AHK script which can transfer all results in 'Everything' to TC.

The well-known TC parameters ev: and ed: are recommended for further processing in TC.

As a hobby user, I don't think I can contribute anything more to this topic anyway.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

tuska wrote: 2023-11-03, 16:46 UTC I thought you were looking for file names that contain this character: U+0308.
(Although, you could have copied this character and pasted it into the Everything search box).
Yes, you are correct, I was looking for names containing U+0308, but that is an easy task to perform from within TC-FileFind - once you've found out you are looking for U+0308, that is. :wink: So I wouldn't have bothered our distinguished forum-members with this. And most certainly I was not looking for U+0308 on top of a dotted circle. :mrgreen:

My actual hope when opening this thread was if someone could name me a special tool or plugin that would be capable of distinguishing "classic" filenames using ordinary text-symbols from seemingly identical look-alike-names which in reality would make use of more "exotic" symbol-combinations (like diacritical marks) in order to feign "classic" single-character-special-symbols like "Umlauts" thereby precluding these composite-pseudo-Umlauts from being found by a conventional search for names containing the actual Umlaut-symbol as shown in the TC-file-panel.

Now color-coding any filenames containing pseudo-Umlauts (composite of ordinary vowels plus U+308 on top) - like you suggested - will be an important early-warning-sign indicating possible problems for an upcoming search. But I still hold it would be an even more profound solution for this problem if TC could somehow (using plugins or even external tools) display the native HEX-string corresponding to a given file-name right from the NTFS-MasterFileTable.
User avatar
tuska
Power Member
Power Member
Posts: 4049
Joined: 2007-05-21, 12:17 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *tuska »

2georgeb

It's certainly not what you want, but I have tried to narrow down the search (using various forum posts).
Maybe it will help you or someone else with a similar problem.

Search query in 'Everything 1.5.0.1359a (x64)' + DIACRITICS (Match diacritics: Enabled)

Code: Select all

files:regex:"[^ -~]" !regex:[\x{1F300}-\x{1F5ff}] !<ä|ö|ü|Ö|Ä|Ü|ß|§>
Explanatory note

Code: Select all

files: ......................... match files only
regex:"[^ -~]" 	................ find all non-standard ASCII characters:
				 https://www.voidtools.com/forum/viewtopic.php?p=33898#p33898
!regex:[\x{1F300}-\x{1F5ff}] ... do NOT find Miscellaneous Symbols and Pictographs, e.g. 🎁, 🌲
 				 https://www.voidtools.com/forum/viewtopic.php?p=36554#p36554 Emoticons
 				 https://en.wikipedia.org/wiki/Unicode_block
!<ä|ö|ü|Ö|Ä|Ü|ß|§> ............. do NOT find German umlauts, special character 'ß' and character '§'
Match diacritics:  ............. ENABLED
! .............................. NOT
I can recommend creating a bookmark in 'Everything'.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

tuska wrote: 2023-11-04, 19:13 UTC It's certainly not what you want, but I have tried to narrow down the search (using various forum posts).
Maybe it will help you or someone else with a similar problem.
Certainly not what I wanted in terms of being able to inspect the "true nature" of a filename in question by displaying its native HEX-entry from the MFT thereby revealing the use of any strange, not expected symbols.

Nevertheless a very valuable tool to identify the possible culprit in case of unexpected results of a FileFind-run. Also - and again - many thanks for your idea about color-coding such filenames via plugin-template. I would never have thought about that kind of
workaround for at least allowing to distinguish between otherwise identical-looking filenames and to identify the "strange version" among those two - which exactly has been my initial problem. ->bookmarked !
User avatar
Dalai
Power Member
Power Member
Posts: 9963
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *Dalai »

georgeb wrote: 2023-11-03, 18:06 UTCBut I still hold it would be an even more profound solution for this problem if TC could somehow (using plugins or even external tools) display the native HEX-string corresponding to a given file-name right from the NTFS-MasterFileTable.
I'm going to repeat it again: the MFT doesn't need to be queried to get the file name. Windows already provides the relevant information to all programs (including TC) via the Win32 API. What programs do with that information is up to them.

It would be quite simple to write a content plugin that provides the file's name in Hex notation which can then be put in a custom column. What I'm wondering is what you intend to do with the file name's Hex notation, especially considering that each character of the file name is represented by (at least) a two-character string, i.e. a space would be shown as 0x20, or just 20. This effectively doubles the number of characters to look at.

Do you intend to look at each single byte to determine whether or not it's outside of the ANSI codepages? Do you want to do this for each long file names for each file? Sounds pretty labor-intensive. IMO the easier way would be to search for files that have such names (you can use the UnicodeTest plugin I mentioned above) and rename them with the MRT.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
NotNull
Senior Member
Senior Member
Posts: 298
Joined: 2019-11-25, 20:43 UTC
Location: NL

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *NotNull »

tuska wrote: 2023-11-03, 14:32 UTC Colouring file names with Unicode character "◌̈" (U+0308):
Very clever!!

georgeb wrote: 2023-11-05, 05:07 UTC Certainly not what I wanted in terms of being able to inspect the "true nature" of a filename in question by displaying its native HEX-entry from the MFT thereby revealing the use of any strange, not expected symbols.
Although this could work, it has a couple of limitations:
- It will not show these characters on local non-NTFS volumes like FAT32
- Same for files on network drives or cloud storage
- To access the MFT on Windows system volumes ("your C:-drive"), you need elevated rights (running as admin).
MFTs on other local NTFS volumes can be accessed as a regular user though.
- Opening the MFT requires special conditions. If $MFT gets locked somehow, the system hangs completely and needs a reboot to function again.


And as @Dalai already pointed out: this will provide no extra information (for this case) compared to what a regular reading of filenames already can give you.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

Ok guys, finally an extended search of the Internet has turned up useful results. I've now at least stumbled upon an external tool that can do the trick.
https://www.disk-editor.org/index.html
is a Freeware-tool that seemingly can do what even XW-Forensics cannot (at least without epic raw-data-searches on the entire physical drive)!

Just browse to the file(-name) in question on the respective logical disk-drive - and then just right-mouse-click on that and select "Inspect File Record / Ctrl+Shift+H". This operation will take you directly to the associated file-record within the MFT offering all the advanced data-inspector-windows a professional Hex-editor usually would offer. Voilà!
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

Dalai wrote: 2023-11-05, 14:28 UTC I'm going to repeat it again: the MFT doesn't need to be queried to get the file name. Windows already provides the relevant information to all programs (including TC) via the Win32 API. What programs do with that information is up to them.

It would be quite simple to write a content plugin that provides the file's name in Hex notation which can then be put in a custom column.
Agreed. Querying the MFT in this case at first glance may well look like using a sledgehammer for cracking nuts. But then if we simply rename a file (like in TC) we're going to even WRITE in there anyway. So what's the big deal? I also do understand that the MFT wouldn't be needed for that purpose as the Win32 API would already provide all necessary information. IF ONLY such a content-plugin would exist, that is. Or someone would bother to write it so that I could create such a custom-column-view as suggested.
Dalai wrote: 2023-11-05, 14:28 UTC What I'm wondering is what you intend to do with the file name's Hex notation, especially considering that each character of the file name is represented by (at least) a two-character string, i.e. a space would be shown as 0x20, or just 20. This effectively doubles the number of characters to look at.

Do you intend to look at each single byte to determine whether or not it's outside of the ANSI codepages? Do you want to do this for each long file names for each file? Sounds pretty labor-intensive. IMO the easier way would be to search for files that have such names (you can use the UnicodeTest plugin I mentioned above) and rename them with the MRT.
I'm afraid, @Dalai, we're talking past each other here. Once I know that diacritics (or even better knowing exactly which ones) are involved in the seemingly abnormal behavior of a search (in this case not finding names with "Umlauts" which most certainly were present and existing) I can then easily apply the methods suggested by you to search for those names and in the aftermath apply the MRT to take care of those and straighten them out.

But don't let us get ahead of ourselves here. The situation I found myself in at the start was that the search - for whatever reason - had malfunctioned. Also - to complicate things even more - there even turned out to be a few binary identical versions of those files missing THAT ACTUALLY WERE FOUND, amazingly enough, while 90% of them were still "missing in action" in the search results. So the few files found in search - to make the situation even more absurd - had binary duplicates with ABSOLUTELY IDENTICAL LOOKING FILE-NAMES which in fact COULD NOT BE IDENTICAL as they would co-exist within the very same directory! Crazy, isn't it? But even when I already had the suspicion that the "Umlauts" would likely be the culprits I couldn't tell apart which ones were the "regular versions" (wit real "Umlauts") and which ones must have contained strange, somehow presumably "doctored Umlauts" AS THESE NAMES (at least in the TC-file-panels) WERE ABSOLUTELY INDISTINGUISHABLE and still - for instance - copying one over the other would NOT OVERWRITE the target, both seemingly identical versions would further co-exist within the same directory. Renaming? HOW? If you've no idea which version is "regular" and which version is a "fake-Umlaut"-one?

Now this is clearly where the HEX-view would come in (via a Win-API-plugin or otherwise)! Because the actual problem is not about a filename using rare Unicode-symbols - which TC, by the way, can perfectly handle - the "diabolic nature" of those diacritics lies in their peculiarity of being non-advancing-characters, printed/displayed as a so called "dead key" with the main symbol to follow at the same position. In HEX-Unicode on the other hand there is no such thing like a "dead key" and therefore such a "fake-Umlaut" would not only be 2 bytes long (as you seemed to admonish) but rather 4 bytes long (b/c of the diacritic itself using two extra-bytes in UTF16) - thereby unmistakably standing out against the regular "Umlaut" by an absolutely tamper-proof quality such as the file-names' string-length. As in that representation any obfuscating ambiguities between optically identical-looking file-names would no longer be possible. And, what's more, from the HEX-representation the precise UTF16-code for the diacritic mark alone (and not in combination with the underlying basic symbol) can be easily extracted for an upcoming search & renaming-procedure.
Last edited by georgeb on 2023-11-06, 06:56 UTC, edited 1 time in total.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

georgeb wrote: 2023-11-05, 14:57 UTC https://www.disk-editor.org/index.html
Just browse to the file(-name) in question on the respective logical disk-drive - and then just right-mouse-click on that and select "Inspect File Record / Ctrl+Shift+H". This operation will take you directly to the associated file-record within the MFT offering all the advanced data-inspector-windows a professional Hex-editor usually would offer. Voilà!
Latest update:
You can even create a button in TC directly transferring the filename under the cursor to that tool to open with.
Just be careful not to put the parameters %P%N in square brackets as some exotic help-file would suggest.

Code: Select all

TOTALCMD#BAR#DATA
YourPath\DiskEditor.exe
f=%P%N
YourPath\DiskEditor.exe
Open in ActiveDiskEdit
YourPath
-1
Once the file opens just press "Inspect File Record" from the menu.
HTH
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *milo1012 »

From what I can see in this topic: the actual problem wasn't mentioned by name:
http://en.wikipedia.org/wiki/Unicode_equivalence

You could use my plug-in "NFCname" for detecting and "reversing" (normalizing) such characters
viewtopic.php?p=293756
Like Ghisler said, since TC 9.0 TC's MRT can also do this normalization:

Code: Select all

[u]
And BTW:
I'm quite sure that the NTFS entry is identical to the UTF-16 (UCS-2) bytes exposed be the WIN32 API, i.e. it's "agnostic" to the API. Yes, there are some rare cases where you have to deal with the Windows "reserved" characters, so in such cases NTFS bytes would be different from the ones from the API, but in your case of diacritics everything should be the same.
TC plugins: PCREsearch and RegXtract
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

milo1012 wrote: 2023-11-06, 10:38 UTC From what I can see in this topic: the actual problem wasn't mentioned by name:
http://en.wikipedia.org/wiki/Unicode_equivalence
Many thanks for your contribution and for pointing this out. You may have hit the nail right on top with your observation - the simple reason for me perhaps having somewhat clumsily paraphrased the actual problem until now is that I've been completely unaware of this existing Unicode-terminology.

Skimming over that Wikipedia-link although raises a bunch of new questions, though. When explaining the term "canonically equivalent" they use the example of the Spanish letter "ñ" (U+00F1) and its composite counterpart with the "combining tilde" (U+0303). This being an obvious analogon to my "ö"/"o"+(U+0308)-problem.

And then they state: "Therefore, those sequences should be displayed in the same manner, should be treated in the same way by applications such as alphabetizing names or searching, and may be substituted for each other. "

But wouldn't that suggest then that somehow a search for names containing "ö" in TC (or "Everything" that is) would also have to "magically" retrieve those file-versions utilizing the composite "o"+(U+0308)-representation of the resulting "Umlaut"?
milo1012 wrote: 2023-11-06, 10:38 UTC You could use my plug-in "NFCname" for detecting and "reversing" (normalizing) such characters
viewtopic.php?p=293756
Like Ghisler said, since TC 9.0 TC's MRT can also do this normalization:

Code: Select all

[u]
Ok, I will have to look into that.
milo1012 wrote: 2023-11-06, 10:38 UTC And BTW:
I'm quite sure that the NTFS entry is identical to the UTF-16 (UCS-2) bytes exposed be the WIN32 API, i.e. it's "agnostic" to the API.
What is it with all the experts here and their preference for the WIN32 API over NTFS-MFT? As a user i don't really care which way those UTF-16LE-bytes representing the "true file-name" are retrieved. In other words - I would happily accept them being exposed via the WIN32 API - if only someone here could tell me how exactly (or by what command etc.) the WIN32 API could reproduce them in a readable manner. :wink:
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *milo1012 »

georgeb wrote: 2023-11-06, 15:03 UTCWhat is it with all the experts here and their preference for the WIN32 API over NTFS-MFT?
Well obviously it is "repeatedly" mentioned because "raw", i.e. low-level, NTFS access requires kernel functions which you can only access via admin rights, plus they are not so well documented like WIN32.
The usual WIN32 API is - well - an API, i.e. you can use a C/C++ compiler or some wrapper for other languages, link the WIN32 libs and you can access these well-known functions for file system access.
For which functions you want to use, is up to you, but you want to start probably with sth like:
List all files in a directory, which you need to repeatedly call in a loop:
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findnextfilew
You'll get a buffer with 16-bit (UTF-16) characters:
( https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw )
-> cFileName
, which are highly likekly the same raw bytes (UTF-16 bytes) as stored on NTFS.
But to emphasize this (again): TC is agnostic as well, as it uses these API functions, i.e. it doesn't touch the filenames it gets from the API, with only rare exceptions (reserved characters).
georgeb wrote: 2023-11-06, 15:03 UTC...
But wouldn't that suggest then that somehow a search for names containing "ö" in TC (or "Everything" that is) would also have to "magically" retrieve those file-versions utilizing the composite "o"+(U+0308)-representation of the resulting "Umlaut"?
Technically they should not be treated as the same, as the bytes are not the same, but "logically" these could (should?) be treated as the same. In fact some modern/current browsers treat both variants as the same. Just test it: use Ctrl+F and search for your Umlaut - it should find both variants.
I think it's more or less a comfort function, as these whole Unicode concepts are still quite "new" *cough*. Especially Apple prefers NFD, while the de-facto standard for web is NFC. So the classic culprit is: you save a file on a apple machine with the name having such decomposable character(s), transfer it to some other location and read the name via Windows or other OSs software. IMO it's up to the software at hand, if it translates these variants transparent to the user, or handles them on a low-level basis and therefore differently. Like I said, some browsers do, probably the newer office suits too, but most "standard" software doesn't.
TC plugins: PCREsearch and RegXtract
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

milo1012 wrote: 2023-11-06, 16:29 UTCThe usual WIN32 API is - well - an API, i.e. you can use a C/C++ compiler or some wrapper for other languages, link the WIN32 libs and you can access these well-known functions for file system access.
For which functions you want to use, is up to you, but you want to start probably with sth like:
List all files in a directory, which you need to repeatedly call in a loop:
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew
...
( https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw )
-> cFileName
, which are highly likekly the same raw bytes (UTF-16 bytes) as stored on NTFS.
Thanks for the effort to explain. But you've got to be kidding.You'd rather assign me on a two-weeks-mission in system-programming than simply use a ready-made tool to look the desired info up in the MFT?

I certainly have no bias against the WIN32 API - but it would have to do much better than that in order to expose those sought-after name-bytes from the MFT, like a simple Windows-command or some simple function to call. :roll: For example one that could directly be used to produce a custom-column-view revealing those native filename-bytes in HEX within a modified TC-panel, much like @Dalai has suggested.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *milo1012 »

georgeb wrote: 2023-11-06, 22:17 UTC I certainly have no bias against the WIN32 API - but it would have to do much better than that in order to expose those sought-after name-bytes from the MFT, like a simple Windows-command or some simple function to call.
...
I didn't say that you need to do this yourself, I was just giving a hint for how to "expose" things as a starting point.
And there are tons of code examples out there which you can probably compile just easy.

But anyway, there is definitely an even simpler way: use a decent text editor which is UTF-16 capably and copy the filenames to a file. As mentioned more than once now, TC does almost never manipulate filenames in memory after it got it from the API. So just use the Windows clipboard, which uses UTF-16 too for whatever Unicode text you feed into it. When you copy the filename to clipboard, there is not really a conversion done, the original UTF-16 bytes will stay the same (you just get an optional ANSI-recoded format in the clipboard in parallel). Now use TC as an intermediate tool:
  • just mark your files, now use TC command "cm_CopyNamesToClip" or e.g. the button "Copy names with full path" already shipped with TC
  • use a decent text editor like notepad2, notepad3 (notepad++ for some reason does not allow UTF-16 without BOM), et al., where the current file encoding is shown in the status bar
  • create a new empty file either directly with UTF-16 encoding, or convert that empty file to UTF-16 (without BOM)
  • use ctrl+V. Voilà: you copied probably the orignal bytes to a raw UTF-16 file.
You can now hex edit the file to see the raw bytes.
TC plugins: PCREsearch and RegXtract
User avatar
Dalai
Power Member
Power Member
Posts: 9963
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *Dalai »

georgeb wrote: 2023-11-06, 22:17 UTCThanks for the effort to explain. But you've got to be kidding.
TC already uses something like this if not exactly this function to retrieve the filenames. You don't need to do any coding.

2milo1012:
Isn't is easier to use cm_SaveSelectionToFileW instead and open the resulting file in a Hex editor (or even Lister in Hex mode)?

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Post Reply