Preliminary information about Unicode support (TC7.5)

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48025
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Preliminary information about Unicode support (TC7.5)

Post by *ghisler(Author) »

As some of you already know, I'm currently adding full Unicode support to Total Commander, for the next big release 7.5.

Unicode support in plugins will work like this:
1. All existing functions remain unchanged
2. Where Unicode file names as parameters are possible, there will be an additional function ending with "W". This function will take the Unicode name
3. If the function is present, TC will call it - but only on NT-based systems (Windows NT/2000/XP/Vista)
4. If the function isn't present, and on Win9x/ME systems, TC will call the already existing ANSI functions. Unicode parts of the file name will be converted to the 8.3 (DOS) form first. The plugin will not be called if the 8.3 names are disabled, or there isn't a valid 8.3 name.

Example: The Lister plugin function ListLoad:

Currently defined as:
HWND __stdcall ListLoad(HWND ParentWin,char* FileToLoad,int ShowFlags);

An additional function ListLoadW will be added:
HWND __stdcall ListLoadW(HWND ParentWin,WCHAR* FileToLoad,int ShowFlags);

This way, all existing plugins will continue to work, even in Unicode subdirectories and for files with Unicode names, and Lister shows the Unicode file name in its title. Plugin writers can add full Unicode support relatively easily.

The ANSI functions will still have to be implemented, for all the cases where the Unicode functions cannot be called:
- Windows 9x/ME
- older versions of Total Commander
- third party programs

What do you think?
Last edited by ghisler(Author) on 2007-09-27, 19:36 UTC, edited 1 time in total.
Author of Total Commander
https://www.ghisler.com
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

What do you think?
This is the solution I suggested so I couldn't be happier :-D
Unicode parts of the file name will be converted to the 8.3 (DOS) form first. The plugin will not be called if the 8.3 names are disabled, or there isn't a valid 8.3 name.
This is what already happens in the current version of TC right?



Will you also introduce a widechar character return type (maybe ft_widestring) in the content plug-in interface?
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker »

[mod]Some [OT] posts were split to Unicode to ANSI conversion when launching an application.

Hacker (Moderator)[/mod]
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
VadiMGP
Power Member
Power Member
Posts: 672
Joined: 2003-04-05, 12:11 UTC
Location: Israel

Post by *VadiMGP »

2ghisler(Author)
I have two questions.

1. What about WDX? Do you intend to add an additional field type "unicode string"?

2. What about all plugins/utilities accessing INI, INC, BAR files? Do you intend to turn all these files to unicode too?
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

I have updated my tiny content plug-in "Attributes" to support Unicode.

This is how I plan to change my plug-ins. The following points are important especially if you haven't worked with Unicode before:
  • The string type char* has been changed into wchar_t* for all variables and constants in the plug-in.
  • The strings in the array "fieldNames" are declared with L"String instead of "String".
  • The source file contains a pair of functions to be exported. The Unicode functions have a W postfix.
  • The Unicode functions do the whole work and contain almost the same code that was previously used in the ANSI functions. The only difference here is the call to GetFileAttributesW which is the Unicode version of this function. You have to rename every single API call.
  • The ANSI version of ContentGetSupportedField first calls the Unicode function and then converts the returned Unicode string values to ANSI.
  • The ANSI version of ContentGetValue first converts the delivered ANSI string to Unicode and then call the Unicode function.
  • This encapsulation of functions results in smaller code and zero redundancy compared to implementing each function twice.
If you follow these rules converting your plug-in to Unicode is straightforward and the plug-in file size won't increase too much.

Code: Select all

// Attributes.h
#include <windows.h>
#include "contentplug.h"

// Number of fields supported by this plug-in.
const DWORD FIELD_COUNT = 14;

// An array of size FIELD_COUNT containing the names of all file attributes covered 
// by GetFileAttributes system operation.
wchar_t* fieldNames [FIELD_COUNT] = {L"Read Only", L"Hidden", L"System", 
	L"Directory", L"Archive", L"Device", L"Normal", 
	L"Temporary", L"Sparse File", L"Reparse Point", L"Compressed", 
	L"Offline", L"Not Content Indexed", L"Encrypted"};

// An array of size FIELD_COUNT containing the numeric constants of all file attributes 
// covered by GetFileAttributes system operation.
// The array index is used as field index.
DWORD attributeConstants [FIELD_COUNT] = {FILE_ATTRIBUTE_READONLY, FILE_ATTRIBUTE_HIDDEN, 
	FILE_ATTRIBUTE_SYSTEM, FILE_ATTRIBUTE_DIRECTORY, FILE_ATTRIBUTE_ARCHIVE, 
	FILE_ATTRIBUTE_DEVICE, FILE_ATTRIBUTE_NORMAL, FILE_ATTRIBUTE_TEMPORARY, 
	FILE_ATTRIBUTE_SPARSE_FILE, FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_COMPRESSED,
	FILE_ATTRIBUTE_OFFLINE, FILE_ATTRIBUTE_NOT_CONTENT_INDEXED, FILE_ATTRIBUTE_ENCRYPTED};

Code: Select all

// Attributes.cpp
#include "Attributes.h"
#include <strsafe.h>

BOOL APIENTRY DllMain(HANDLE, DWORD, LPVOID)
{
    return TRUE;
}

int __stdcall ContentGetSupportedField(int FieldIndex, char* FieldName, char* Units, int maxlen)
{
	wchar_t wideFieldName [MAX_PATH] = {0};	
	wchar_t wideUnits [MAX_PATH] = {0};	
	int result = ContentGetSupportedFieldW (FieldIndex, wideFieldName, wideUnits, maxlen);
	WideCharToMultiByte (CP_ACP, 0, wideFieldName, MAX_PATH, FieldName, maxlen, NULL, NULL);
	WideCharToMultiByte (CP_ACP, 0, wideUnits, MAX_PATH, Units, maxlen, NULL, NULL);
	return result;
}

int __stdcall ContentGetSupportedFieldW (int FieldIndex, wchar_t* FieldName, wchar_t* Units, int maxlen)
{
	if (FieldIndex >= FIELD_COUNT)
	{
		return ft_nomorefields;
	}
	Units[0] = 0;
	StringCchCopyW (FieldName, maxlen, fieldNames[FieldIndex]);
	return ft_boolean;
}

int __stdcall ContentGetValue (char* FileName, int FieldIndex, int, void* FieldValue, int maxlen, int)
{	
	wchar_t wideFileName [MAX_PATH] = {0};
	MultiByteToWideChar (CP_ACP, MB_PRECOMPOSED, FileName, -1, wideFileName, maxlen);
	return ContentGetValueW (wideFileName, 0, FieldIndex, FieldValue, maxlen, 0);
}

int __stdcall ContentGetValueW (wchar_t* FileName, int FieldIndex, int, void* FieldValue, int, int)
{	
	DWORD attr = GetFileAttributesW (FileName);	
	if (attr == INVALID_FILE_ATTRIBUTES)
	{
		return ft_fileerror;
	}	
	*(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];	
	return ft_boolean;	
}
Last edited by Lefteous on 2007-09-28, 16:06 UTC, edited 1 time in total.
User avatar
XPEHOPE3KA
Power Member
Power Member
Posts: 854
Joined: 2006-03-03, 18:23 UTC
Location: Saint-Petersburg, Russia

Post by *XPEHOPE3KA »

Lefteous wrote:
  • The string type char* has been changed into wchar_t* for all variables and constants in the plug-in.
  • The strings in the array "fieldNames" are declared with L"String instead of "String".
  • The source file contains a pair of functions to be exported. The Unicode functions have a W postfix.
  • The Unicode functions do the whole work and contain almost the same code that was previously used in the ANSI functions. The only difference here is the call to GetFileAttributesW which is the Unicode version of this function. You have to rename every single API call.
  • The ANSI functions first convert the deliered ANSI string to Unicode and then call the Unicode function. This results in smaller code and zero redundancy compared to implementing each function twice.
If you follow these rules converting your plug-in to Unicode is straightforward and the plug-in file size won't increase too much.
I think it's possible to write an automatic converter doing these jobs.
F6, Enter, Tab, F6, Enter, Tab, F6, Enter, Tab... - I like to move IT, move IT!..
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2XPEHOPE3KA
I think it's possible to write an automatic converter doing these jobs.
Sure this is possible but please consider that refactoring is not just mass search & replace. Such a converter really has to understand the code.
There are also special cases where there is only a Unicode version of a Windows API function and previous ANSI only functions had to convert the returned string to ANSI or to Unicode if the API functions requred it as input. It's really not that easy.
And of course there also other programming languages which work a bit different.
VadiMGP
Power Member
Power Member
Posts: 672
Joined: 2003-04-05, 12:11 UTC
Location: Israel

Post by *VadiMGP »

2XPEHOPE3KA
Unfortunately, real headache starts from code like this, although it looks like safety enough

Code: Select all

...
TCHAR buffer[MAX_BUFFER_LEN];
...
_sntprintf(buffer, sizeof(buffer), TEXT("%d"), iVal);
...
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

2Lefteous
The converted version doesn't call GetFileAttributesA, so it won't work under Win9x.
So the conversion procedure has to be changed a bit.
User avatar
eugensyl
Power Member
Power Member
Posts: 564
Joined: 2004-06-03, 18:27 UTC
Location: România
Contact:

Post by *eugensyl »

Good news!

I'm sure you don't forget LISTER to support Unicode.
My Best Wishes,

Eugen
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2Alextp
So the conversion procedure has to be changed a bit.
What is your suggestion?
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

2Lefteous

Code: Select all

function ContentGetValueW(FileName: PWChar; ...);
var
  attr: DWORD;
begin
  if Win32Platform = VER_PLATFORM_WIN32_NT then //!!!
    attr := GetFileAttributesW(FileName)
  else //!!!
    attr := GetFileAttributesA( PChar(AnsiString(WideString(FileName))) );

  if attr = INVALID_FILE_ATTRIBUTES then
    Result := ft_filerror
  else
  begin
    FieldValue := attr and attributeConstants[FieldIndex];
    Result := ft_boolean;
  end;
end;
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

Well if the unicode version doesn't exist the plug-in won't even load. I would have to load all unicode functions dynamically. This is far from
Plugin writers can add full Unicode support relatively easily.
User avatar
Alextp
Power Member
Power Member
Posts: 2321
Joined: 2004-08-16, 22:35 UTC
Location: Russian Federation
Contact:

Post by *Alextp »

Lefteous wrote:Well if the unicode version doesn't exist the plug-in won't even load. I would have to load all unicode functions dynamically.
Nope, in most cases you don't have to load Unicode functions dinamically. Most are implemented in Win9x as a stub. My code above will work on Win9x.
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2Alextp
Let's say it works - sorry but this is just bad code. I would have to do that for all API calls. Consider that this is the simplest plug-in you can think of but you can be sure that are others. The code becomes much more unreadable and the countless if statements won't make it faster.
Post Reply