OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

English support forum

Moderators: Hacker, petermad, Stefan2, white

georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

This may sound a bit strange and off-topic but it has an actual TC-background.
I've been searching the HD for invoices with a certain filename recently containing German Umlauts (like ..böck). But to my profound amazement none of them were retrieved by FileFind although I knew for sure they had to be there. My suspicion began to center around the "Umlaut" so I used a question-mark instead - still no hit. But using an asterisk they suddenly were there - now that was strange! Could that be due to some undiscovered new bug? So I went back to an older installation of TC10 and tried again - same results. This became even more strange when I found out that other "ö"-Umlauts were found correctly, only this particular name could not be found, neither with the use of "Everything" nor without.

To make a long story short after some further analysis it turned out that the issuer of those invoices hadn't used an Umlaut-"ö" at all in those names but rather an ordinary "o" followed by a so called "Combining Diaeresis" (U+0308) which is optically indistinguishable in the TC-panel from a filename using the proper Umlaut-symbol. Next I copied all those files in question to a temporary location and started an MRT-process cutting that pseudo-Umlaut-part and replacing it with the proper notation.

Interestingly enough I then tried a SyncDirs process between the altered and the original Dir just to make sure those files had identical content. But now they were all of a sudden found as "unique left" and "unique right" although those names looked exactly identical and only on individual comparison, one by one, they were found to have identical content.

This is of course due to the somewhat unfortunate circumstance that the SyncDirs-process in not capable of comparing files by binary content only - while disregarding their filenames (as the FileFind-duplicate-search is capable of).

So here comes the question: Does anybody among the many expert-users in this forum know of a tool that could directly show (ideally on right-click-context-menu-entry or on a user-defined button) the corresponding raw filename-/fileheader-entry from the NTFS-MFT-metadata, preferably also in HEX-mode so as to make the real naming-difference visible when either using the true symbol "ö" or rather a simple "o" followed by a "Combining Diaeresis" (which actually makes that name two bytes longer in HEX-view).

I know about - and have used - a low-level-disk-editor offering Hex-view to search the native HD for that particular name-entry but this took like forever and returned multiple "hits" which made it impossible for me to identify the true raw-filename-entry within the MFT that would directly correspond to the filename-entry in question from the TC-panel.

Once again - in other words: it is easy to display the contents of any file in HEX-mode. But is there any (info-)-tool that could directly link to the RAW FILE-NAME-entry from the MFT-metadata-section and show that name-entry in raw-/HEX-mode thereby revealing any actual differences that might exist in notation/encoding/symbols_used within those names?
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50475
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *ghisler(Author) »

To make a long story short after some further analysis it turned out that the issuer of those invoices hadn't used an Umlaut-"ö" at all in those names but rather an ordinary "o" followed by a so called "Combining Diaeresis" (U+0308)
MacOS does this, unfortunately.
You could use the multi-rename tool to replace these by regular umlauts.
Author of Total Commander
https://www.ghisler.com
User avatar
Dalai
Power Member
Power Member
Posts: 9963
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *Dalai »

I don't think it's necessary to show the raw MFT data since the regular filenames are already different - it would be enough to show the regular filename from TC in Hex mode, if that's really what you want.

But I can imagine that there's an easier way since it's just an issue of display or rendering. Did you try different fonts in TC's file lists? You can probably find files with such names via TC's Find Files.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

ghisler(Author) wrote: 2023-11-03, 11:18 UTC MacOS does this, unfortunately.
You could use the multi-rename tool to replace these by regular umlauts.
Yes, that is what I did after I had found out what the culprit was. But it turned out to be quite cumbersome to identify the problem. Since both versions of filenames looked exactly identical in the TC-file-panel - how could the user distinguish one version from the other? SyncDirs would show binary identical files as "unique left" AND "unique right" and one cannot tell apart which version is which. Even (seemingly) weirder - when copying the renamed versions over the original ones these original versions won't get overwritten but instead one will get two co-existing versions of seemingly identical filenames in the very same directory- an absolute no-go in everyday-computer-world. Of course once you know about the culprit it's easy to explain. You simply have got two versions of binary-identical files with - strictly speaking - different names then which albeit look absolutely identical.

So the conundrum is founded in the indistinguishability of those two file-NAME-versions in the TC-file-panels and it again boils down to the question: Is there a way in TC - perhaps using some external tool or plugin - to make the TRUE FILENAMES VISIBLE showing their actual HEX-strings and thereby revealing them as actually different names in order to being able to tell them apart?
User avatar
Dalai
Power Member
Power Member
Posts: 9963
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *Dalai »

2georgeb
Did you try what I suggested above?
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
tuska
Power Member
Power Member
Posts: 4049
Joined: 2007-05-21, 12:17 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *tuska »

There was this topic in the 'Everything' forum:
How to find all names with special characters?

If you could find out the correct value for Unicode character "◌̈" (U+0308)
e.g. in Unicode block - below "External links" then a search in TC using 'Everything'
or directly in 'Everything' would also be possible.
Last edited by tuska on 2023-11-03, 14:01 UTC, edited 1 time in total.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

Dalai wrote: 2023-11-03, 11:28 UTC I don't think it's necessary to show the raw MFT data since the regular filenames are already different - it would be enough to show the regular filename from TC in Hex mode, if that's really what you want.
Yes, that is exactly what I want. But HOW exactly could that be done - get an exact HEX-representation of the file-NAME, rather than one of its content which is easy?
Dalai wrote: 2023-11-03, 11:28 UTC But I can imagine that there's an easier way since it's just an issue of display or rendering. Did you try different fonts in TC's file lists? You can probably find files with such names via TC's Find Files.
Not sure in how far different fonts could make the actual differences in names visible? And to identify them in FileFind? Possible, perhaps. But then you'd have to know about which strings exactly to look for in the first place. And as both name-versions do look exactly identical (using conventional fonts) we instantaneously get thrown back to the initial problem again: how to tell the actually different names apart!
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

Dalai wrote: 2023-11-03, 13:28 UTC 2georgeb
Did you try what I suggested above?
Not yet as I don't know how. What kind of font do you think might be helpful?
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

tuska wrote: 2023-11-03, 13:31 UTC There was this topic in the 'Everything' forum:
How to find all names with special characters?

If you could find out the correct value for Unicode character "◌̈" (U+0308)
e.g. in Unicode block - List of blocks then a search in TC using 'Everything'
or directly in 'Everything' would also be possible.
In this special case I did find out about the special character - but it turned out to be quite tedious and there may be lots of similarly strange cases involving other special Unicode-characters.

So searching for such characters is not the problem - once you've found out about them. What makes the situation quite difficult is that such possible name-differences involving combined Unicode-symbols are completely invisible in the TC-file-panel.

So we're back to the initial question: how to make the actual name-differences VISIBLE?, perhaps best accommodated by showing the native file-NAME as HEX-string.
User avatar
Dalai
Power Member
Power Member
Posts: 9963
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *Dalai »

georgeb wrote: 2023-11-03, 13:51 UTCNot yet as I don't know how. What kind of font do you think might be helpful?
Well, Options > Font > File list font. Monospace fonts will hopefully show a difference. Yes, I'm aware that such fonts are not really suitable for the file lists. Just try out different fonts.

The plugin UnicodeTest might have a useful here. Create a custom columns view that has a column with the field [=unicodetest.Unicode test].

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

Dalai wrote: 2023-11-03, 14:10 UTC Well, Options > Font > File list font. Monospace fonts will hopefully show a difference. Yes, I'm aware that such fonts are not really suitable for the file lists. Just try out different fonts.

The plugin UnicodeTest might have a useful here. Create a custom columns view that has a column with the field [=unicodetest.Unicode test].
Actually for file-list-font in TC-panels I always use monospaced fonts for enhanced visual clarity, in my case "Consolas". But I really don't see how they could make these actual name-differences visible as monospaced fonts will only represent each character by a symbol of equal width but won't separately display Unicode-special-characters which are intended to result in a combined character (like, for instance, non-spacing accent-symbols).

As for the UnicodeTest-plugin I will have a look into it - but the simplest way IMHO would still be to right-click on any given TC-panel-filename and get it shown in HEX.
Last edited by georgeb on 2023-11-03, 14:41 UTC, edited 1 time in total.
User avatar
tuska
Power Member
Power Member
Posts: 4049
Joined: 2007-05-21, 12:17 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *tuska »

2georgeb
> I did find out about the special character...
What was it?

Colouring file names with Unicode character "◌̈" (U+0308):
Configuration > Options... > Color > Define colors by file type... > Add... > Define... > tab "Plugins" >
[x] Search in plugins > Plugin: tc | Property: name | OP: contains | Value: ◌̈
click on Button "Save" > Template name, e.g. Unicode character U+0308 > OK > OK > For example, click in the field
with the red colour > OK > optional: click on the button "Dark<->Normal" > OK > OK.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

tuska wrote: 2023-11-03, 14:32 UTC > I did find out about the special character...
What was it?
Well, as you already surmised: (U+0308)
But again, the problem is not searching for such a character - once you know a filename uses that. The problem is that for instance a diacritical mark like the diaeresis is displayed together with the base-character, in this case above an ordinary o, and therefore becomes visually indistinguishable from the actual Umlaut-character "ö" which is one single character-symbol.

As a result in the TC-file-panel you won't recognize that such a special combined symbol has been used and hence won't know what to look for. Only in HEX-mode there are no combined character-symbols and so it will inevitably become visible whenever such combined symbols are used as they actually consist of two separate, consecutive symbols.

As for the colored filenames - I will have to look into this if and how this works with diacritical marks. Presumably the whole filename in TC would be displayed in a different color then and it wouldn't make sense do do that with only (U+0308) and not with other diacritical marks as well. But it would at least give the user a hint that a certain filename would contain some strange symbols.
User avatar
tuska
Power Member
Power Member
Posts: 4049
Joined: 2007-05-21, 12:17 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *tuska »

2georgeb

I meant something more like this - directly in 'Everything':

Code: Select all

regex:[\x{25CC}]|regex:[\x{0308}]
(I have no idea about RegEx but I managed to do that -
no further questions please about RegEx).

EmEditor - Ctrl+i
◌̈
U+25CC U+0308
UTF-16LE: 0x25CC 0x0308
DOTTED CIRCLE; COMBINING DIAERESIS
Unicode Script: Zyyy (Common): Zinh (Inherited)
Unicode General Category: So (Other Symbol): Mn (Nonspacing Mark)

This only shows me the following files in Everything 1.5.0.1359a (x64), for example:
◌̈ bla bla.txt
◌̈.txt.lnk
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: OffTopic / Tool to display FILE*NAME* as raw entry from the MFT

Post by *georgeb »

tuska wrote: 2023-11-03, 15:42 UTC I meant something more like this - directly in 'Everything':

Code: Select all

regex:[\x{25CC}]|regex:[\x{0308}]
Yes, you are searching for files containing an adjacent combination of U+25CC, a dotted circle, with a Trema on top - directly in "Everything".

That of course is possible - but what use is it for? :? :wink: TBH I haver NEVER encountered any name or term containing such a symbol "◌̈".

Also it is not clear to me what the advantage of searching directly in "Everything" would be when the search can equally be done from within TC which offers a much better GUI?

But thanks for the advice about color coding. If the HEX-code of the fileNAME cannot be made visible "the easy way" I now will at least get some warning if a filename contains "doctored" fake-Umlauts and in case of two identical-looking filenames I can now at least tell them apart and identify the one containing the "exotic" symbols.
Post Reply