[WFX] Inconsistency of FsGetFile with special characters

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: sheep, Hacker, Stefan2, white

Post Reply
User avatar
Dalai
Power Member
Power Member
Posts: 6757
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

[WFX] Inconsistency of FsGetFile with special characters

Post by *Dalai » 2015-12-07, 18:40 UTC

Hi there :)

While developing another WFX plugin I stumbled upon another inconsistency in TC's behavior when FsGetFile() function gets called.

Situation: WFX plugin returns several files and folders to TC, some of them contain special characters like slashes that are forbidden in local Windows file-systems. Example: "TCP/IP-Protokolltreiber".
  • Viewing such a file with F3 cuts all parts but the last one after the slash. Example: "IP-Protokolltreiber".
  • Copying the same file on the other hand replaces the special characters. Example: "TCP_IP-Protokolltreiber".
So far so good. It's not consistent behavior, but no real issue here.

No comes the "fun" part: Copying directories DOES NOT replace special characters in directory names, they're only replaced in file names! So, you can't duplicate the structure of a file-system plugin with special characters in directory names, unless the plugin itself takes care of them.

Is this by design? I'd rather call it a bug, but I'm not sure.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 38190
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) » 2015-12-10, 10:54 UTC

It's currently not handled. The problem is that directories with the replacement character can actually exist too, resulting in much more troubles than when a file with the same name already exists.
Author of Total Commander
http://www.ghisler.com

User avatar
MVV
Power Member
Power Member
Posts: 8397
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2015-12-10, 12:33 UTC

I think it is a plugin task to convert filenames to a format supported by Windows.

User avatar
Dalai
Power Member
Power Member
Posts: 6757
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Post by *Dalai » 2015-12-10, 13:08 UTC

MVV: I don't agree. Why? Well - apart from my laziness ;) : every plugin author would have to make the implementation again for his/her own plugin(s). And, the authors would've to take care of all forbidden characters, and even names like nul, com1 and so on ...

When TC would handle such cases instead, there would be only one implementation that automagically works for all plugins. This would also be good for (older) plugins that are not developed anymore.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups

User avatar
milo1012
Power Member
Power Member
Posts: 1104
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2015-12-10, 13:34 UTC

I agree that the plug-in author has to take responsibility to create valid names for Windows namespace.
Why? Because you know which names to expect, i.e. what possibilities for invalid chars there are.
You can optimize your replacement algorithm, like using some simple masks, etc.
Letting TC do it for every file is an enormous expense, plus you don't have any control over WHICH replacement character will be used.
Dalai wrote:ike nul, com1...
You mean

Code: Select all

CON, PRN, AUX, NUL, COM1...COM9, LPT1...LPT9
If you don't allow copying such files out of the virtual file system, you don't need to bother checking for them.
It only gets necessary when you want to create actual files with such names,
but even if you skip that you would get an error code from the API functions, which you can check to get the cause.
Seriously, checking EVERY file name for those names BEFORE reporting them to TC is an enormous computational expense.
It's like breaking a butterfly on a wheel ;)
TC plugins: PCREsearch and RegXtract

User avatar
Dalai
Power Member
Power Member
Posts: 6757
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Post by *Dalai » 2015-12-10, 13:51 UTC

milo1012 wrote:Why? Because you know which names to expect, i.e. what possibilities for invalid chars there are.
No, I don't. Names may depend on the system language like the example in the OP. English names usually don't have forbidden characters but there may be such in other languages (like the German example above). And I don't know which characters to expect in other languages since I don't know how Microsoft localized the names.

So, I'd end up replacing ALL characters again.

And what about the plugin interface docs about FsGetFile:
LocalName
Local file name with full path, either with a drive letter or UNC path (\\Server\Share\filename). The plugin may change the NAME/EXTENSION of the file (e.g. when file conversion is done), but not the path!
You and MVV (and Ghisler indirectly) tell me to change the local name, but the docs say otherwise.
Letting TC do it for every file is an enormous expense, plus you don't have any control over WHICH replacement character will be used.
I can agree here. However, it would be fine if TC would use an underscore like it already does for file names.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups

User avatar
milo1012
Power Member
Power Member
Posts: 1104
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2015-12-10, 14:03 UTC

Dalai wrote:No, I don't. Names may depend on the system language like the example in the OP.
I meant that the general demand for filtering depends on the plug-in task,
like if your task can except any natural language characters ("letters") in the first place,
or if special characters, like slashes/backslashes, are already filtered out.
For some wfx plug-ins you e.g. access a file system with a full character set,
for others you can assume that they don't have the Windows forbidden characters (they are already filtered out be definition).

But sure, for your special task you probably have to expect every possible char (except null).


Dalai wrote:And I don't know which characters to expect in other languages since I don't know how Microsoft localized the names.
Erm, you need to expect the full Unicode range, if your plug-in is Unicode capable,
otherwise only characters from the user specific ANSI code page.
And I'd say that
\ / : * ? " < > |
may appear in any language, you can't and should not make any assumptions about that.
So a filter for those chars is mandatory (if creating files).
TC plugins: PCREsearch and RegXtract

User avatar
MVV
Power Member
Power Member
Posts: 8397
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2015-12-10, 14:15 UTC

Dalai wrote:No, I don't. Names may depend on the system language like the example in the OP.
I can agree here. However, it would be fine if TC would use an underscore like it already does for file names.
Usually forbidden characters may only be from first half of codepage (with codes below 128, mostly first 32 and some special characters like the ones mentioned above - these characters always have the same codes in every language) and it is very easy for you to detect them e.g. using simple replace table, especially when you know that such names may occur in your plugin (you know that better than TC).
Dalai wrote:And what about the plugin interface docs about FsGetFile:
This simply means that you should extract file from your plugin to a local folder that user have selected and not into another folder. But you can change target file name or extension (and it is exactly what you need as I understand).

User avatar
Dalai
Power Member
Power Member
Posts: 6757
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Post by *Dalai » 2015-12-10, 14:26 UTC

milo1012 wrote:Erm, you need to expect the full Unicode range, if your plug-in is Unicode capable,
otherwise only characters from the user specific ANSI code page.
Yes, of course, but Unicode is not the issue here. This is only about characters not allowed in file/directory names.
And I'd say that
\ / : * ? " < > |
may appear in any language, you can't and should not make any assumptions about that.
That's not what I meant. Of course these characters may appear in any language. I'm not making any assumptions on their appearance, I just wanted to make clear that I would have to expect ANY forbidden character when I don't know how MS translated their stuff.
MVV wrote:This simply means that you should extract file from your plugin to a local folder that user have selected and not into another folder.
Yeah, but IT IS another directory if I'm supposed to replace forbidden characters to make copying the structure work. Like Ghisler said: directories with the replacement character can already exist.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups

User avatar
MVV
Power Member
Power Member
Posts: 8397
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV » 2015-12-10, 15:50 UTC

I think that replacing forbidden characters in folder names should be allowed in such cases so you can do it.

User avatar
milo1012
Power Member
Power Member
Posts: 1104
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2015-12-10, 16:12 UTC

Dalai wrote:I just wanted to make clear that I would have to expect ANY forbidden character when I don't know how MS translated their stuff.
Yes, and that's just what I meant too.
For YOUR plug-in you have no other choice but to except any character, for such plug-in, because
you get some strings (Service names, Registry names) from whatever API function,
and no matter in what (natural) language these names appear: you always have to expect the full character range.

But for a different plug-in you may know in advance that the strings have a limited charset, or don't ever produce Windows forbidden chars.

Dalai wrote:directories with the replacement character can already exist.
I think the chance to produce duplicate entries are thin.
And I know that the wcx interface actually takes duplicate entries without problems when listing
(but for accessing them you will always access the first entry).
Not sure about the wfx interface though.

You need to make unique names, so simply make unique replacements,
like using the full Unicode charset:

Code: Select all

\  → ⧹
/ → ⧸
: → :
* → ٭
or maybe "#"
? → ﹖
" → "
or maybe " ' "
< → ≺
or maybe " ( "
> → ≻
or maybe " ) "
| → ∣
or maybe " _ "
Last edited by milo1012 on 2015-12-10, 18:21 UTC, edited 2 times in total.
TC plugins: PCREsearch and RegXtract

User avatar
Dalai
Power Member
Power Member
Posts: 6757
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Post by *Dalai » 2015-12-10, 16:38 UTC

milo1012 wrote:
Dalai wrote:directories with the replacement character can already exist.
I think the chance to produce duplicate entries are thin.
I meant there may already exist directories in the local file-system. The remote name can contain any character - at least until you try to copy the remote structure to the local file-system (hence this thread about FsGetFile).
And I know that the wcx interface actually takes duplicate entries without problems when listing
(but for accessing them you will always access the first entry).
Not sure about the wfx interface though.
It's exactly the same for WFX plugins. You can return duplicates to TC without any problem, but since all other interface functions use the file name as parameter and differentiator, you certainly run into trouble with duplicate names. There are some WFX plugins out there that have this very problem.
You need to make unique names
Yes, I know, but that's not a problem since the directory and file names are all unique when I return them to TC. It's just about those illegal characters in directory names when TC passes them to the plugin in the interface functions (FsGetFile in particular).


-----

I just did some tests with the replacements, and it works (kind of), but since TC doesn't create the directories which contain illegal characters in the local file-system (sure, how could it?), I have to create them myself, too. Currently, I'm not sure whether I really want to make such feature to work, or just ignore it ...

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups

User avatar
milo1012
Power Member
Power Member
Posts: 1104
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2015-12-10, 16:52 UTC

Dalai wrote:Yes, I know, but that's not a problem since the directory and file names are all unique when I return them to TC. It's just about those illegal characters in directory names when TC passes them to the plugin in the interface functions (FsGetFile in particular).
That's exactly what I'm talking about.
Why don't you return them to TC already replaced with a unique scheme, such like the one that I posted above.
So TC will use those names for all subsequent operations.
You have no other choice if you want to create valid-only names, as far as I can see.


And you can't just use one single replacement char, because replacing
"IP-Protokolltreiber"
and e.g.
"IP>Protokolltreiber"

will both result in
"IP_Protokolltreiber"
TC plugins: PCREsearch and RegXtract

User avatar
Dalai
Power Member
Power Member
Posts: 6757
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Post by *Dalai » 2015-12-10, 18:41 UTC

milo1012 wrote:Why don't you return them to TC already replaced with a unique scheme, such like the one that I posted above.
Well, for one, it would look unusual to the user. Second, it would be much more effort since the data structure doesn't care about file names. I would either have to use a two-way replacement to find the correct object or only save replaced names in my data structure (which is kind of silly since this data class doesn't need to know anything about files). And lastly, it costs more in terms of resources because I'd do replacements without knowing whether they are needed or not.
And you can't just use one single replacement char, because replacing
"IP-Protokolltreiber"
and e.g.
"IP>Protokolltreiber"

will both result in
"IP_Protokolltreiber"
What do you mean? File names are not a problem, but directory names are. Ah, I guess I know what you're getting at: Replacing illegal characters in different file names may result in the same file name. No problem here since file names are unique (by their extension).

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups

User avatar
milo1012
Power Member
Power Member
Posts: 1104
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 » 2015-12-10, 19:04 UTC

I would either have to use a two-way replacement to find the correct object or only save replaced names in my data structure
...
And lastly, it costs more in terms of resources because I'd do replacements without knowing whether they are needed or not.
I think for today's machines it shouldn't be much of a problem (unless you have a very large number of entries, like more than 50000 or so).
There are always possibilities to make things more efficient. But it's your call.
The interface was designed around remote file systems. The FS plug-ins that I use try to keep names simple.
Using it for arbitrary text strings eventually requires some kind of workaround or compromise.

And FYI: For some of my plug-ins I have to take a separation between display and internal string too,
simply because I don't have much of a choice, due to different encodings (e.g. ZPAQ: UTF-8 in archive, UTF-16 for WinAPI/WCX-Interface).

Ah, I guess I know what you're getting at: Replacing illegal characters in different file names may result in the same file name.
Exactly.
TC plugins: PCREsearch and RegXtract

Post Reply