This forum uses cookies. Click X button to hide this message. What is stored? 
Total Commander Forum Index Total Commander
Forum - Public Discussion and Support
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

[WDX] NFCname - Unicode Normalization, UTF-8 reinterpreter

 
Post new topic   Reply to topic    Total Commander Forum Index -> Plugins and addons: devel.+support (English) Printable version
View previous topic :: View next topic  
Author Message
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1034

PostPosted: Wed Mar 04, 2015 2:13 pm    Post subject: [WDX] NFCname - Unicode Normalization, UTF-8 reinterpreter Reply with quote

NFCname 1.1

A content plug-in (wdx) for Total Commander which scans filenames for at least
partially not being in the NFC Unicode normalization form, which is the preferred form on Windows.
You can return the normalized filename, so that you're able to use TC's MRT (Multi-Rename tool)
to transform all names to the NFC form. You can also do the reverse: return all names in NFD form,
or the OS X variant of NFD.
Additionally you can scan and correct filenames which have UTF-8 sequences misinterpreted as ANSI bytes.

Why is this necessary?
Some systems (especially OS X due to HFS+) might use NFD or some other Normalization forms,
which can complicate file sync and name uniqueness.
This might also happen if you download some files from the Web, where the suggested
filenames result e.g. from the Title of HTML files (in case of consisting of such composable
characters).
For example the German Umlauts can exist in two forms:
Code:
 ü
 ü

The first one is being composed (NFD), the second is not (NFC), but both can exist
as filenames in a directory at the same time!
http://en.wikipedia.org/wiki/Unicode_equivalence

The plug-in provides the additional functionality of scanning and correcting filenames for
falsely interpreted UTF-8 sequences. This means UTF-8 sequences interpreted as ANSI bytes,
which were finally recoded to UTF-16 (Windows Unicode).
Example:
A filename
Code:
Motörhead - Ace of Spades.mp3

was most likely UTF-8, but was not recognized as that and therefore interpreted as ANSI bytes
(non-Unicode). The plug-in is able to detect and correct such sequence and would recode it to
Code:
Motörhead - Ace of Spades.mp3


The plug-in just checks the filename part, not the path itself.
This means that a file like
'c:\dir1\über\file1.txt'
will not report as being not in NFC when using the plug-in as e.g. custom column for that file.
But for TC'S search it will report the dir name 'über', since the path structure
is called recursively there (of course only when you set the search location (start location)
to be somewhere in or above that dir).



Total Commander 7.50 or newer is required.
Works on Windows NT 4.0, 2000, XP or newer (no support for Windows 9x).


Current Version 1.1:
(32+64 bit+source)
totalcmd.net
SHA1: de32479e8aa464ed0d7958696a07f9b08934003d


Old Version 1.0:
(32+64 bit)
totalcmd.net
SHA1: fe04851df750e4ff412ed25191a59c73f8ae225f

Old Version 0.8:
(32+64 bit)
totalcmd.net
SHA1: 2009919f640db339dcdfdb178b48671df0fd23cd



Please report bugs and give me some feedback.
_________________
TC plugins: PCREsearch and RegXtract


Last edited by milo1012 on Wed Mar 15, 2017 10:05 pm; edited 2 times in total
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1034

PostPosted: Wed Mar 15, 2017 9:57 pm    Post subject: Reply with quote

New Version 1.1!
  • added capability for scanning and correcting filenames which have sequences of
    UTF-8 bytes falsely interpreted as ANSI bytes, e.g. 'Motörhead' instead of 'Motörhead'

Check the first post for the new file.
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    Total Commander Forum Index -> Plugins and addons: devel.+support (English) All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Impressum: This site is maintained by Ghisler Software GmbH

Using phpBB © 2001-2005 phpBB Group