This forum uses cookies. Click X button to hide this message. What is stored? / Privacy
Total Commander Forum Index Total Commander
Forum - Public Discussion and Support
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

[WFX] Inconsistency of FsGetFile with special characters

 
Post new topic   Reply to topic    Total Commander Forum Index -> Plugins and addons: devel.+support (English) Printable version
View previous topic :: View next topic  
Author Message
Dalai
Power Member
Power Member


Joined: 28 Jan 2005
Posts: 6082
Location: Meiningen (Südthüringen)

PostPosted: Mon Dec 07, 2015 12:40 pm    Post subject: [WFX] Inconsistency of FsGetFile with special characters Reply with quote

Hi there Smile

While developing another WFX plugin I stumbled upon another inconsistency in TC's behavior when FsGetFile() function gets called.

Situation: WFX plugin returns several files and folders to TC, some of them contain special characters like slashes that are forbidden in local Windows file-systems. Example: "TCP/IP-Protokolltreiber".

  • Viewing such a file with F3 cuts all parts but the last one after the slash. Example: "IP-Protokolltreiber".
  • Copying the same file on the other hand replaces the special characters. Example: "TCP_IP-Protokolltreiber".
So far so good. It's not consistent behavior, but no real issue here.

No comes the "fun" part: Copying directories DOES NOT replace special characters in directory names, they're only replaced in file names! So, you can't duplicate the structure of a file-system plugin with special characters in directory names, unless the plugin itself takes care of them.

Is this by design? I'd rather call it a bug, but I'm not sure.

Regards
Dalai
_________________
#101164 Personal licence
Athlon X4 880K, 16 GiB RAM, Gigabyte F2A88X-D3HP, Win7 x64

Plugins: Services2, Startups
Back to top
View user's profile Send private message Send e-mail
ghisler(Author)
Site Admin
Site Admin


Joined: 04 Feb 2003
Posts: 35956
Location: Switzerland

PostPosted: Thu Dec 10, 2015 4:54 am    Post subject: Reply with quote

It's currently not handled. The problem is that directories with the replacement character can actually exist too, resulting in much more troubles than when a file with the same name already exists.
_________________
Author of Total Commander
http://www.ghisler.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 8058
Location: Russian Federation

PostPosted: Thu Dec 10, 2015 6:33 am    Post subject: Reply with quote

I think it is a plugin task to convert filenames to a format supported by Windows.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel…
Back to top
View user's profile Send private message Send e-mail
Dalai
Power Member
Power Member


Joined: 28 Jan 2005
Posts: 6082
Location: Meiningen (Südthüringen)

PostPosted: Thu Dec 10, 2015 7:08 am    Post subject: Reply with quote

MVV: I don't agree. Why? Well - apart from my laziness Wink : every plugin author would have to make the implementation again for his/her own plugin(s). And, the authors would've to take care of all forbidden characters, and even names like nul, com1 and so on ...

When TC would handle such cases instead, there would be only one implementation that automagically works for all plugins. This would also be good for (older) plugins that are not developed anymore.

Regards
Dalai
_________________
#101164 Personal licence
Athlon X4 880K, 16 GiB RAM, Gigabyte F2A88X-D3HP, Win7 x64

Plugins: Services2, Startups
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1072

PostPosted: Thu Dec 10, 2015 7:34 am    Post subject: Reply with quote

I agree that the plug-in author has to take responsibility to create valid names for Windows namespace.
Why? Because you know which names to expect, i.e. what possibilities for invalid chars there are.
You can optimize your replacement algorithm, like using some simple masks, etc.
Letting TC do it for every file is an enormous expense, plus you don't have any control over WHICH replacement character will be used.

Dalai wrote:
ike nul, com1...

You mean
Code:
CON, PRN, AUX, NUL, COM1...COM9, LPT1...LPT9

If you don't allow copying such files out of the virtual file system, you don't need to bother checking for them.
It only gets necessary when you want to create actual files with such names,
but even if you skip that you would get an error code from the API functions, which you can check to get the cause.
Seriously, checking EVERY file name for those names BEFORE reporting them to TC is an enormous computational expense.
It's like breaking a butterfly on a wheel Wink
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
Dalai
Power Member
Power Member


Joined: 28 Jan 2005
Posts: 6082
Location: Meiningen (Südthüringen)

PostPosted: Thu Dec 10, 2015 7:51 am    Post subject: Reply with quote

milo1012 wrote:
Why? Because you know which names to expect, i.e. what possibilities for invalid chars there are.

No, I don't. Names may depend on the system language like the example in the OP. English names usually don't have forbidden characters but there may be such in other languages (like the German example above). And I don't know which characters to expect in other languages since I don't know how Microsoft localized the names.

So, I'd end up replacing ALL characters again.

And what about the plugin interface docs about FsGetFile:
Quote:
LocalName
Local file name with full path, either with a drive letter or UNC path (\\Server\Share\filename). The plugin may change the NAME/EXTENSION of the file (e.g. when file conversion is done), but not the path!

You and MVV (and Ghisler indirectly) tell me to change the local name, but the docs say otherwise.

Quote:
Letting TC do it for every file is an enormous expense, plus you don't have any control over WHICH replacement character will be used.

I can agree here. However, it would be fine if TC would use an underscore like it already does for file names.

Regards
Dalai
_________________
#101164 Personal licence
Athlon X4 880K, 16 GiB RAM, Gigabyte F2A88X-D3HP, Win7 x64

Plugins: Services2, Startups
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1072

PostPosted: Thu Dec 10, 2015 8:03 am    Post subject: Reply with quote

Dalai wrote:
No, I don't. Names may depend on the system language like the example in the OP.

I meant that the general demand for filtering depends on the plug-in task,
like if your task can except any natural language characters ("letters") in the first place,
or if special characters, like slashes/backslashes, are already filtered out.
For some wfx plug-ins you e.g. access a file system with a full character set,
for others you can assume that they don't have the Windows forbidden characters (they are already filtered out be definition).

But sure, for your special task you probably have to expect every possible char (except null).



Dalai wrote:
And I don't know which characters to expect in other languages since I don't know how Microsoft localized the names.

Erm, you need to expect the full Unicode range, if your plug-in is Unicode capable,
otherwise only characters from the user specific ANSI code page.
And I'd say that
\ / : * ? " < > |
may appear in any language, you can't and should not make any assumptions about that.
So a filter for those chars is mandatory (if creating files).
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 8058
Location: Russian Federation

PostPosted: Thu Dec 10, 2015 8:15 am    Post subject: Reply with quote

Dalai wrote:
No, I don't. Names may depend on the system language like the example in the OP.

Quote:
I can agree here. However, it would be fine if TC would use an underscore like it already does for file names.

Usually forbidden characters may only be from first half of codepage (with codes below 128, mostly first 32 and some special characters like the ones mentioned above - these characters always have the same codes in every language) and it is very easy for you to detect them e.g. using simple replace table, especially when you know that such names may occur in your plugin (you know that better than TC).
Dalai wrote:
And what about the plugin interface docs about FsGetFile:

This simply means that you should extract file from your plugin to a local folder that user have selected and not into another folder. But you can change target file name or extension (and it is exactly what you need as I understand).
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel…
Back to top
View user's profile Send private message Send e-mail
Dalai
Power Member
Power Member


Joined: 28 Jan 2005
Posts: 6082
Location: Meiningen (Südthüringen)

PostPosted: Thu Dec 10, 2015 8:26 am    Post subject: Reply with quote

milo1012 wrote:
Erm, you need to expect the full Unicode range, if your plug-in is Unicode capable,
otherwise only characters from the user specific ANSI code page.

Yes, of course, but Unicode is not the issue here. This is only about characters not allowed in file/directory names.

Quote:
And I'd say that
\ / : * ? " < > |
may appear in any language, you can't and should not make any assumptions about that.

That's not what I meant. Of course these characters may appear in any language. I'm not making any assumptions on their appearance, I just wanted to make clear that I would have to expect ANY forbidden character when I don't know how MS translated their stuff.

MVV wrote:
This simply means that you should extract file from your plugin to a local folder that user have selected and not into another folder.

Yeah, but IT IS another directory if I'm supposed to replace forbidden characters to make copying the structure work. Like Ghisler said: directories with the replacement character can already exist.

Regards
Dalai
_________________
#101164 Personal licence
Athlon X4 880K, 16 GiB RAM, Gigabyte F2A88X-D3HP, Win7 x64

Plugins: Services2, Startups
Back to top
View user's profile Send private message Send e-mail
MVV
Power Member
Power Member


Joined: 03 Aug 2008
Posts: 8058
Location: Russian Federation

PostPosted: Thu Dec 10, 2015 9:50 am    Post subject: Reply with quote

I think that replacing forbidden characters in folder names should be allowed in such cases so you can do it.
_________________
TCFS2 + TCFS2Tools: Full-screen mode for TC etc (forum)
TOTALCMD.NET: AskParam, CopyTree, NTLinks, Sudo, VirtualPanel…
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1072

PostPosted: Thu Dec 10, 2015 10:12 am    Post subject: Reply with quote

Dalai wrote:
I just wanted to make clear that I would have to expect ANY forbidden character when I don't know how MS translated their stuff.

Yes, and that's just what I meant too.
For YOUR plug-in you have no other choice but to except any character, for such plug-in, because
you get some strings (Service names, Registry names) from whatever API function,
and no matter in what (natural) language these names appear: you always have to expect the full character range.

But for a different plug-in you may know in advance that the strings have a limited charset, or don't ever produce Windows forbidden chars.


Dalai wrote:
directories with the replacement character can already exist.

I think the chance to produce duplicate entries are thin.
And I know that the wcx interface actually takes duplicate entries without problems when listing
(but for accessing them you will always access the first entry).
Not sure about the wfx interface though.

You need to make unique names, so simply make unique replacements,
like using the full Unicode charset:
Code:
\  → ⧹
/ → ⧸
: → :
* → ٭
or maybe "#"
? → ﹖
" → "
or maybe " ' "
< → ≺
or maybe " ( "
> → ≻
or maybe " ) "
| → ∣
or maybe " _ "

_________________
TC plugins: PCREsearch and RegXtract


Last edited by milo1012 on Thu Dec 10, 2015 12:21 pm; edited 2 times in total
Back to top
View user's profile Send private message Send e-mail
Dalai
Power Member
Power Member


Joined: 28 Jan 2005
Posts: 6082
Location: Meiningen (Südthüringen)

PostPosted: Thu Dec 10, 2015 10:38 am    Post subject: Reply with quote

milo1012 wrote:
Dalai wrote:
directories with the replacement character can already exist.

I think the chance to produce duplicate entries are thin.

I meant there may already exist directories in the local file-system. The remote name can contain any character - at least until you try to copy the remote structure to the local file-system (hence this thread about FsGetFile).

Quote:
And I know that the wcx interface actually takes duplicate entries without problems when listing
(but for accessing them you will always access the first entry).
Not sure about the wfx interface though.

It's exactly the same for WFX plugins. You can return duplicates to TC without any problem, but since all other interface functions use the file name as parameter and differentiator, you certainly run into trouble with duplicate names. There are some WFX plugins out there that have this very problem.

Quote:
You need to make unique names

Yes, I know, but that's not a problem since the directory and file names are all unique when I return them to TC. It's just about those illegal characters in directory names when TC passes them to the plugin in the interface functions (FsGetFile in particular).


-----

I just did some tests with the replacements, and it works (kind of), but since TC doesn't create the directories which contain illegal characters in the local file-system (sure, how could it?), I have to create them myself, too. Currently, I'm not sure whether I really want to make such feature to work, or just ignore it ...

Regards
Dalai
_________________
#101164 Personal licence
Athlon X4 880K, 16 GiB RAM, Gigabyte F2A88X-D3HP, Win7 x64

Plugins: Services2, Startups
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1072

PostPosted: Thu Dec 10, 2015 10:52 am    Post subject: Reply with quote

Dalai wrote:
Yes, I know, but that's not a problem since the directory and file names are all unique when I return them to TC. It's just about those illegal characters in directory names when TC passes them to the plugin in the interface functions (FsGetFile in particular).


That's exactly what I'm talking about.
Why don't you return them to TC already replaced with a unique scheme, such like the one that I posted above.
So TC will use those names for all subsequent operations.
You have no other choice if you want to create valid-only names, as far as I can see.


And you can't just use one single replacement char, because replacing
"IP-Protokolltreiber"
and e.g.
"IP>Protokolltreiber"

will both result in
"IP_Protokolltreiber"
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
Dalai
Power Member
Power Member


Joined: 28 Jan 2005
Posts: 6082
Location: Meiningen (Südthüringen)

PostPosted: Thu Dec 10, 2015 12:41 pm    Post subject: Reply with quote

milo1012 wrote:
Why don't you return them to TC already replaced with a unique scheme, such like the one that I posted above.

Well, for one, it would look unusual to the user. Second, it would be much more effort since the data structure doesn't care about file names. I would either have to use a two-way replacement to find the correct object or only save replaced names in my data structure (which is kind of silly since this data class doesn't need to know anything about files). And lastly, it costs more in terms of resources because I'd do replacements without knowing whether they are needed or not.

Quote:
And you can't just use one single replacement char, because replacing
"IP-Protokolltreiber"
and e.g.
"IP>Protokolltreiber"

will both result in
"IP_Protokolltreiber"

What do you mean? File names are not a problem, but directory names are. Ah, I guess I know what you're getting at: Replacing illegal characters in different file names may result in the same file name. No problem here since file names are unique (by their extension).

Regards
Dalai
_________________
#101164 Personal licence
Athlon X4 880K, 16 GiB RAM, Gigabyte F2A88X-D3HP, Win7 x64

Plugins: Services2, Startups
Back to top
View user's profile Send private message Send e-mail
milo1012
Power Member
Power Member


Joined: 02 Feb 2012
Posts: 1072

PostPosted: Thu Dec 10, 2015 1:04 pm    Post subject: Reply with quote

Quote:
I would either have to use a two-way replacement to find the correct object or only save replaced names in my data structure
...
And lastly, it costs more in terms of resources because I'd do replacements without knowing whether they are needed or not.

I think for today's machines it shouldn't be much of a problem (unless you have a very large number of entries, like more than 50000 or so).
There are always possibilities to make things more efficient. But it's your call.
The interface was designed around remote file systems. The FS plug-ins that I use try to keep names simple.
Using it for arbitrary text strings eventually requires some kind of workaround or compromise.

And FYI: For some of my plug-ins I have to take a separation between display and internal string too,
simply because I don't have much of a choice, due to different encodings (e.g. ZPAQ: UTF-8 in archive, UTF-16 for WinAPI/WCX-Interface).


Quote:
Ah, I guess I know what you're getting at: Replacing illegal characters in different file names may result in the same file name.

Exactly.
_________________
TC plugins: PCREsearch and RegXtract
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    Total Commander Forum Index -> Plugins and addons: devel.+support (English) All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Imprint/Impressum: This site is maintained by Ghisler Software GmbH
Privacy Policy | Datenschutzerklärung | Politique de Confidentialité

Using phpBB © phpBB Group