[WDX] NFCname - Unicode Normalization, UTF-8 reinterpreter

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

Post Reply
User avatar
milo1012
Power Member
Power Member
Posts: 1088
Joined: 2012-02-02, 19:23 UTC

[WDX] NFCname - Unicode Normalization, UTF-8 reinterpreter

Post by *milo1012 » 2015-03-04, 20:13 UTC

NFCname 1.1

A content plug-in (wdx) for Total Commander which scans filenames for at least
partially not being in the NFC Unicode normalization form, which is the preferred form on Windows.
You can return the normalized filename, so that you're able to use TC's MRT (Multi-Rename tool)
to transform all names to the NFC form. You can also do the reverse: return all names in NFD form,
or the OS X variant of NFD.
Additionally you can scan and correct filenames which have UTF-8 sequences misinterpreted as ANSI bytes.

Why is this necessary?
Some systems (especially OS X due to HFS+) might use NFD or some other Normalization forms,
which can complicate file sync and name uniqueness.
This might also happen if you download some files from the Web, where the suggested
filenames result e.g. from the Title of HTML files (in case of consisting of such composable
characters).
For example the German Umlauts can exist in two forms:

Code: Select all

 ü
 ü
The first one is being composed (NFD), the second is not (NFC), but both can exist
as filenames in a directory at the same time!
http://en.wikipedia.org/wiki/Unicode_equivalence

The plug-in provides the additional functionality of scanning and correcting filenames for
falsely interpreted UTF-8 sequences. This means UTF-8 sequences interpreted as ANSI bytes,
which were finally recoded to UTF-16 (Windows Unicode).
Example:
A filename

Code: Select all

Motörhead - Ace of Spades.mp3
was most likely UTF-8, but was not recognized as that and therefore interpreted as ANSI bytes
(non-Unicode). The plug-in is able to detect and correct such sequence and would recode it to

Code: Select all

Motörhead - Ace of Spades.mp3
The plug-in just checks the filename part, not the path itself.
This means that a file like
'c:\dir1\über\file1.txt'
will not report as being not in NFC when using the plug-in as e.g. custom column for that file.
But for TC'S search it will report the dir name 'über', since the path structure
is called recursively there (of course only when you set the search location (start location)
to be somewhere in or above that dir).



Total Commander 7.50 or newer is required.
Works on Windows NT 4.0, 2000, XP or newer (no support for Windows 9x).


Current Version 1.1:
(32+64 bit+source)
totalcmd.net
SHA1: de32479e8aa464ed0d7958696a07f9b08934003d


Old Version 1.0:
(32+64 bit)
totalcmd.net
SHA1: fe04851df750e4ff412ed25191a59c73f8ae225f

Old Version 0.8:
(32+64 bit)
totalcmd.net
SHA1: 2009919f640db339dcdfdb178b48671df0fd23cd



Please report bugs and give me some feedback.
Last edited by milo1012 on 2017-03-16, 04:05 UTC, edited 2 times in total.
TC plugins: PCREsearch and RegXtract

User avatar
milo1012
Power Member
Power Member
Posts: 1088
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2017-03-16, 03:57 UTC

New Version 1.1!
  • added capability for scanning and correcting filenames which have sequences of
    UTF-8 bytes falsely interpreted as ANSI bytes, e.g. 'Motörhead' instead of 'Motörhead'
Check the first post for the new file.
TC plugins: PCREsearch and RegXtract

Post Reply