[9.0 b14 x64] TC - file rename tool and diacritic
Moderators: Hacker, petermad, Stefan2, white
[9.0 b14 x64] TC - file rename tool and diacritic
In this version and previous doesn't work correct renaming files (search and replace). I can't replace characters with diacritic. Diacritic letter aren't found and replaced in file names...
- ghisler(Author)
- Site Admin
- Posts: 50550
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Not confirmed. Please give me more details:
1. The name of the file you try to change
2. The "Search for" string
3. The "Replace with" string
4. Your language and country settings in Windows (e.g. English, USA)
1. The name of the file you try to change
2. The "Search for" string
3. The "Replace with" string
4. Your language and country settings in Windows (e.g. English, USA)
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
1.
Požadavek na ukončení zaměstnance.xls
Požadavek na nového zaměstnance.xls
2. ž and others characters (ěčí)
3. z (eci)
4. my language is Czech, czech file name, windows is in czech localization
After apply replace, file name is not change and characters in filename are show same as before (left side and right side preview window show same, but in find input is ž and in replace input is writed z).
Then I rename files manually. When I press backspace to delete character "ž" from filename, this character was deleted but instead this deleted character is showed "z". "z" character was not showed before deleted "ž".
This same issue was in other diacritic characters.
Maybe search/replace dialog add not diacritic variant to filename to correct positions, but not showed and diacritic characters still viewed.
I test this character replace secondly, and then manual file renaming in classic windows explorer with same behaviour. After delete "ž" is showed on same position "z"...
These files I received by e-mail. When I create new blank txt file in my PC with same file name, rename work correct.
I examined the files at hand and found, that these office files was created in Microsoft Macintosh Office.
I don't know, if you can this problem correct in TC, or this issue is in Windows explorer functions (bad characters encoding from Mac).
Thank you and sorry for bad English...
Požadavek na ukončení zaměstnance.xls
Požadavek na nového zaměstnance.xls
2. ž and others characters (ěčí)
3. z (eci)
4. my language is Czech, czech file name, windows is in czech localization
After apply replace, file name is not change and characters in filename are show same as before (left side and right side preview window show same, but in find input is ž and in replace input is writed z).
Then I rename files manually. When I press backspace to delete character "ž" from filename, this character was deleted but instead this deleted character is showed "z". "z" character was not showed before deleted "ž".
This same issue was in other diacritic characters.
Maybe search/replace dialog add not diacritic variant to filename to correct positions, but not showed and diacritic characters still viewed.
I test this character replace secondly, and then manual file renaming in classic windows explorer with same behaviour. After delete "ž" is showed on same position "z"...
These files I received by e-mail. When I create new blank txt file in my PC with same file name, rename work correct.
I examined the files at hand and found, that these office files was created in Microsoft Macintosh Office.
I don't know, if you can this problem correct in TC, or this issue is in Windows explorer functions (bad characters encoding from Mac).
Thank you and sorry for bad English...
- ghisler(Author)
- Site Admin
- Posts: 50550
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
The problem is that the ž in the name is not the same as the ž you are using for search+replace:
The first is a z followed by a reversed ^ character. Unicode codes 007A and 030C.
The second is a single character with Unicode code 017E.
The former is mainly used on MacOS, the latter on Windows.
What you can do is create a search+replace rule with both types:
Search for: ž|ž
Replace with: z|z
The first is a z followed by a reversed ^ character. Unicode codes 007A and 030C.
The second is a single character with Unicode code 017E.
The former is mainly used on MacOS, the latter on Windows.
What you can do is create a search+replace rule with both types:
Search for: ž|ž
Replace with: z|z
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
We should call this characters encoding problem by name:
Unicode normalization
One solution is to occasionally scan all file names on disk for the NFD form with my NFCname plug-in and use MRT to convert all such names to the NFC form before doing any other rename operation.
Unicode normalization
One solution is to occasionally scan all file names on disk for the NFD form with my NFCname plug-in and use MRT to convert all such names to the NFC form before doing any other rename operation.
TC plugins: PCREsearch and RegXtract
- ghisler(Author)
- Site Admin
- Posts: 50550
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
OK, I will add a new placeholder to convert all composite Unicode characters (e.g. separate a and ^) to precomposed characters (â, with accent). The user will have to write:
[N]
instead of
[N]
for this conversion.
[N]
instead of
[N]
for this conversion.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
ghisler(Author) wrote:OK, I will add a new placeholder to convert all composite Unicode characters (e.g. separate a and ^) to precomposed characters...
Good to hear.
But I'm curious: what function or lib do you want to use?
Originally I wanted to use a static big lookup table of character replacements, but I couldn't find one - at least when you'd want to cover the complete Unicode plane - and some people said that this isn't possible anyway, due to the number of combination possibilities, or when using a wild mixture of different nomalization forms in the file name. Additionally, such tables might need an update if a newer Unicode standard adds new characters.
You can see that even converter tools like
http://www.w3.org/International/charlint/
don't use simple lookup tables.
When I started my plug-in, I used IsNormalizedString and NormalizeString, but these functions exist on Vista and higher only.
So I switched to the official ICU lib (International Components for Unicode), but it will add quite a bunch of code, the plug-in is therefore nearly one MB big.
TC plugins: PCREsearch and RegXtract
- ghisler(Author)
- Site Admin
- Posts: 50550
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
I'm using FoldString with option MAP_PRECOMPOSED. It's NT based system only, but I'm loading it dynamically - and I don't think that this has any relevance on Windows 9x/ME.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Thx for the info.
If I read it correctly, this function provides the full normalization only on Vista and later, so on XP/2000 we're probably stuck to Unicode < 4.0. So not fully portable (in terms of functionality) between different OSes.
Still, it's working as it should and is probably good enough for most basic diacritics (but maybe not for CJK characters).
If I read it correctly, this function provides the full normalization only on Vista and later, so on XP/2000 we're probably stuck to Unicode < 4.0. So not fully portable (in terms of functionality) between different OSes.
Still, it's working as it should and is probably good enough for most basic diacritics (but maybe not for CJK characters).
TC plugins: PCREsearch and RegXtract
- ghisler(Author)
- Site Admin
- Posts: 50550
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Well, it's the best I could find. And since XP is end of life, there isn't really much to complain...
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
You are right for some poor poeple but it should be no longer drive design decisions.redfox wrote:XP will be among us for a very long time yet. Only MS needed to force new Win versions, which are all worse except for Win7.
Windows 11 Home, Version 24H2 (OS Build 26100.4061)
TC 11.55 RC2 x64 / x86
Everything 1.5.0.1391a (x64), Everything Toolbar 1.5.2.0, Listary Pro 6.3.2.88
QAP 11.6.4.4 x64
TC 11.55 RC2 x64 / x86
Everything 1.5.0.1391a (x64), Everything Toolbar 1.5.2.0, Listary Pro 6.3.2.88
QAP 11.6.4.4 x64
On Windows 2K/XP, the caron and other combining diacritics appear misaligned, and can be selected as separate symbols (and may show up as box character in older or less complete fonts), so the nature of the problem is immediately clear. I've only encountered such filenames in recent few years. Mac "thinks differently"...
#148174 Personal license
Running Total Commander v8.52a
Running Total Commander v8.52a