georgeb wrote: 2023-11-06, 15:03 UTCWhat is it with all the experts here and their preference for the WIN32 API over NTFS-MFT?
Well obviously it is "repeatedly" mentioned because "raw", i.e. low-level, NTFS access requires kernel functions which you can only access via admin rights, plus they are not so well documented like WIN32.
The usual WIN32 API is - well - an API, i.e. you can use a C/C++ compiler or some wrapper for other languages, link the WIN32 libs and you can access these well-known functions for file system access.
For which functions you want to use, is up to you, but you want to start probably with sth like:
List all files in a directory, which you need to repeatedly call in a loop:
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findfirstfilew
https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-findnextfilew
You'll get a buffer with 16-bit (UTF-16) characters:
(
https://learn.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-win32_find_dataw )
-> cFileName
, which are highly likekly the same raw bytes (UTF-16 bytes) as stored on NTFS.
But to emphasize this (again): TC is agnostic as well, as it uses these API functions, i.e. it doesn't touch the filenames it gets from the API, with only rare exceptions (reserved characters).
georgeb wrote: 2023-11-06, 15:03 UTC...
But wouldn't that suggest then that somehow a search for names containing "ö" in TC (or "Everything" that is) would also have to "magically" retrieve those file-versions utilizing the composite "o"+(U+0308)-representation of the resulting "Umlaut"?
Technically they should not be treated as the same, as the bytes are not the same, but "logically" these could (should?) be treated as the same. In fact some modern/current browsers treat both variants as the same. Just test it: use Ctrl+F and search for your Umlaut - it should find both variants.
I think it's more or less a comfort function, as these whole Unicode concepts are still quite "new" *cough*. Especially Apple prefers NFD, while the de-facto standard for web is NFC. So the classic culprit is: you save a file on a apple machine with the name having such decomposable character(s), transfer it to some other location and read the name via Windows or other OSs software. IMO it's up to the software at hand, if it translates these variants transparent to the user, or handles them on a low-level basis and therefore differently. Like I said, some browsers do, probably the newer office suits too, but most "standard" software doesn't.