Preliminary information about Unicode support (TC7.5)
Moderators: Hacker, petermad, Stefan2, white
I created a version which uses generic characters. There is now one generic function which is called from the exported functions. Each exported Unicode function just calls the appropriate generic function. Each exported ANSI function returns nothing when compiled as Unicode or calls the appropriate generic function.
I could just reuse all exported functions in other plug-ins with out a single change and move the body of these functions into the generic functions without changes (beside the already mentioned special cases).
Another thing to consider are distributed libraries. For example in Total SQX there is a ANSI and a Unicode library available. The package size would increase noticeably if I would include both libraries. On the other hand some users could have difficulties to choose the proper version.
An interesting approach could be include both versions in the package but only install the version which is supported on the target platform. Unfortunately this wouldn't work for old TC versions where all files in the plug-in archive are installed.
Well all solutions have pros and cons.
I could just reuse all exported functions in other plug-ins with out a single change and move the body of these functions into the generic functions without changes (beside the already mentioned special cases).
Another thing to consider are distributed libraries. For example in Total SQX there is a ANSI and a Unicode library available. The package size would increase noticeably if I would include both libraries. On the other hand some users could have difficulties to choose the proper version.
An interesting approach could be include both versions in the package but only install the version which is supported on the target platform. Unfortunately this wouldn't work for old TC versions where all files in the plug-in archive are installed.
Well all solutions have pros and cons.
Code: Select all
// Attributes.h
#include <tchar.h>
#include <windows.h>
#include "contentplug.h"
// Number of fields supported by this plug-in.
const DWORD FIELD_COUNT = 14;
// An array of size FIELD_COUNT containing the names of all file attributes covered
// by GetFileAttributes system operation.
TCHAR* fieldNames [FIELD_COUNT] = {TEXT("Read Only"), TEXT("Hidden"), TEXT("System"),
TEXT("Directory"), TEXT("Archive"), TEXT("Device"), TEXT("Normal"),
TEXT("Temporary"), TEXT("Sparse File"), TEXT("Reparse Point"), TEXT("Compressed"),
TEXT("Offline"), TEXT("Not Content Indexed"), TEXT("Encrypted")};
// An array of size FIELD_COUNT containing the numeric constants of all file attributes
// covered by GetFileAttributes system operation.
// The array index is used as field index.
DWORD attributeConstants [FIELD_COUNT] = {FILE_ATTRIBUTE_READONLY, FILE_ATTRIBUTE_HIDDEN,
FILE_ATTRIBUTE_SYSTEM, FILE_ATTRIBUTE_DIRECTORY, FILE_ATTRIBUTE_ARCHIVE,
FILE_ATTRIBUTE_DEVICE, FILE_ATTRIBUTE_NORMAL, FILE_ATTRIBUTE_TEMPORARY,
FILE_ATTRIBUTE_SPARSE_FILE, FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_COMPRESSED,
FILE_ATTRIBUTE_OFFLINE, FILE_ATTRIBUTE_NOT_CONTENT_INDEXED, FILE_ATTRIBUTE_ENCRYPTED};
int getSupportedField (int FieldIndex, TCHAR* FieldName, TCHAR* Units, int maxlen);
int getValue (TCHAR* FileName, int FieldIndex, void* FieldValue);;
Code: Select all
// Attributes.cpp
#include "Attributes.h"
#include <strsafe.h>
BOOL APIENTRY DllMain(HANDLE, DWORD, LPVOID)
{
return TRUE;
}
int __stdcall ContentGetSupportedField(int FieldIndex, char* FieldName, char* Units, int maxlen)
{
#ifdef UNICODE
return ft_nomorefields;
#else
return getSupportedField (FieldIndex, FieldName, Units, maxlen);
#endif
}
#ifdef UNICODE
int __stdcall ContentGetSupportedFieldW (int FieldIndex, wchar_t* FieldName, wchar_t* Units, int maxlen)
{
return getSupportedField (FieldIndex, FieldName, Units, maxlen);
}
#endif
int __stdcall ContentGetValue (char* FileName, int FieldIndex, int, void* FieldValue, int maxlen, int)
{
#ifdef UNICODE
return ft_nosuchfield;
#else
return getValue (FileName, FieldIndex, FieldValue);
#endif
}
#ifdef UNICODE
int __stdcall ContentGetValueW (TCHAR* FileName, int FieldIndex, int, void* FieldValue, int, int)
{
return getValue (FileName, FieldIndex, FieldValue);
}
#endif
int getSupportedField (int FieldIndex, TCHAR* FieldName, TCHAR* Units, int maxlen)
{
if (FieldIndex >= FIELD_COUNT)
{
return ft_nomorefields;
}
Units[0] = 0;
StringCchCopy (FieldName, maxlen, fieldNames[FieldIndex]);
return ft_boolean;
}
int getValue (TCHAR* FileName, int FieldIndex, void* FieldValue)
{
DWORD attr = GetFileAttributes (FileName);
if (attr == INVALID_FILE_ATTRIBUTES)
{
return ft_fileerror;
}
*(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];
return ft_boolean;
}
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Yes, this is planned.Will you also introduce a widechar character return type (maybe ft_widestring) in the content plug-in interface?
INI, INC and BAR files will remain in Ansi form, but Unicode strings will be stored as UTF-8 with UTF-8 byte order marker (BOM) in front of them. This is already working very well, and it's fully backwards compatible. If the file is already Unicode, I will probably store the Unicode strings directly. I haven't implemented that yet, though.What about all plugins/utilities accessing INI, INC, BAR files? Do you intend to turn all these files to unicode too?
Currently Unicode field names are NOT planned. Field names must be in English anyway, the translation is done in a different way.ContentGetSupportedFieldW
About the conversion:
1. Do not call the "W" functions in the Ansi function! They will not work under Windows 9x/ME.
2. You do not need to call the "A" functions in the "W" plugin function. Reason: TC will call the "A" functions under Windows 9x/ME even if there is a "W" function!
So your functions should look like this:
Code: Select all
int __stdcall ContentGetValue (char* FileName, int FieldIndex, int, void* FieldValue, int maxlen, int)
{
DWORD attr = GetFileAttributes (FileName);
if (attr == INVALID_FILE_ATTRIBUTES)
{
return ft_fileerror;
}
*(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];
return ft_boolean;
}
int __stdcall ContentGetValueW (wchar_t* FileName, int FieldIndex, int, void* FieldValue, int, int)
{
DWORD attr = GetFileAttributesW (FileName);
if (attr == INVALID_FILE_ATTRIBUTES)
{
return ft_fileerror;
}
*(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];
return ft_boolean;
}
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Great to hear that there will be a widestring return type 

How can this be compatible? How could an older TC version read such files?INI, INC and BAR files will remain in Ansi form, but Unicode strings will be stored as UTF-8 with UTF-8 byte order marker (BOM) in front of them. This is already working very well, and it's fully backwards compatible. If the file is already Unicode, I will probably store the Unicode strings directly. I haven't implemented that yet, though.
Yes probably Unicode filed names aren't necessary.Currently Unicode field names are NOT planned. Field names must be in English anyway, the translation is done in a different way.
Yes Alextp already wrote that. I dind't think of that initially. I'm not really familiar with the Unicode support on Windows 9x.Do not call the "W" functions in the Ansi function! They will not work under Windows 9x/ME.
I never did! The code above uses generic characters. It automatically picks the right function (A or W) depending on the #UNICODE preprocessor.You do not need to call the "A" functions in the "W" plugin function. Reason: TC will call the "A" functions under Windows 9x/ME even if there is a "W" function!
I'm not really enthusiastic about this. The first copy&paste and edit action may work ok but I have to do this for all future code which deals with strings. This is very error-prone.sually it's easier to just copy+paste the function, and then change the calls to the "W"
functions.
Just a note:Ghisler wrote:About the conversion:
1. Do not call the "W" functions in the Ansi function! They will not work under Windows 9x/ME.
2. You do not need to call the "A" functions in the "W" plugin function.
Lefteous already has counted this, in his example with generic chars.
His variant is OK.
I think using UTF-8 encoding is the case when "the game is not worth the candle". Look at UTF-16, please.ghisler(Author) wrote:INI, INC and BAR files will remain in Ansi form, but Unicode strings will be stored as UTF-8 with UTF-8 byte order marker (BOM) in front of them.
1. Standard Set/GetPrivateProfilexxx functions can handle both UTF-16 and ANSI encoded ini files.
2. According to http://support.microsoft.com/kb/175392 there could be a problem with stored command line parameters, so usercmd.ini and BAR files will require internal translation into ANSI or UTF-16.
3. Using standard Set/GetPrivateProfilexxx functions also solve problem of backward compatibility.
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Unfortunately using UTF16 does NOT solve backwards compatibililty. There are some big problems with it:
1. TC 7.02a and older loads the strings with GetPrivateProfileStringA, so the Unicode encodings are lost. When the user then saves the list, e.g. of the button bar or start menu, the Unicode strings are damaged.
2. When a usr uses dual boot for Windows 9x and Windows XP, then the UTF16 ini file will not work.
However, I plan to support UTF16 too if the the ini is already UTF16, so the user will have the option to convert it himself to UTF16 (e.g. via Notepad).
1. TC 7.02a and older loads the strings with GetPrivateProfileStringA, so the Unicode encodings are lost. When the user then saves the list, e.g. of the button bar or start menu, the Unicode strings are damaged.
2. When a usr uses dual boot for Windows 9x and Windows XP, then the UTF16 ini file will not work.
However, I plan to support UTF16 too if the the ini is already UTF16, so the user will have the option to convert it himself to UTF16 (e.g. via Notepad).
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
IMHO,Lefteous wrote:2ghisler(Author)Could you please explain how UTF-8 does?Unfortunately using UTF16 does NOT solve backwards compatibililty.
For the first 127 characters, ANSI = UTF-8, and UTF-8 strings are zero terminated.
With UTF-16, you have strings with 'zero' bytes, and ANSI functions expecting SZ strings won't like this.
It is fine for me.ghisler(Author) wrote:However, I plan to support UTF16 too if the the ini is already UTF16, so the user will have the option to convert it himself to UTF16 (e.g. via Notepad).
But I'm interesting how you work with UTF-8 files? You wrote you own functions for read/write these files?
It's simpleBut I'm interesting how you work with UTF-8 files? You wrote you own functions for read/write these files?
Code: Select all
var
Ini: TIniFile;
S: WideString;
...
Ini:= TIniFile.Create('C:\Test.ini');
S:= UTF8Decode(Ini.ReadString('Options', 'Param', ''));
Ini.WriteString('Options', 'Param', UTF8Encode(S));
//UTF8Decode/Encode are standard Delphi funcs
Last edited by Alextp on 2007-10-08, 15:40 UTC, edited 2 times in total.