Preliminary information about Unicode support (TC7.5)

Alextp · Post by *Alextp » 2007-09-29, 11:13 UTC

This is the downside of all programs, that should support Unicode and work on Win9x.

Lefteous · Post by *Lefteous » 2007-09-29, 11:27 UTC

2Alextp
Well I see two alternatives.
1) Drop 9x support
2) Use generic characters. This would of course result in two different plug-in versions instead of a pair of functions in a single plug-in file.

Alextp · Post by *Alextp » 2007-09-29, 16:34 UTC

Alternative 2 looks interesting (it isn't possible in Delphi though).

Lefteous · Post by *Lefteous » 2007-09-29, 19:25 UTC

I created a version which uses generic characters. There is now one generic function which is called from the exported functions. Each exported Unicode function just calls the appropriate generic function. Each exported ANSI function returns nothing when compiled as Unicode or calls the appropriate generic function.
I could just reuse all exported functions in other plug-ins with out a single change and move the body of these functions into the generic functions without changes (beside the already mentioned special cases).
Another thing to consider are distributed libraries. For example in Total SQX there is a ANSI and a Unicode library available. The package size would increase noticeably if I would include both libraries. On the other hand some users could have difficulties to choose the proper version.
An interesting approach could be include both versions in the package but only install the version which is supported on the target platform. Unfortunately this wouldn't work for old TC versions where all files in the plug-in archive are installed.
Well all solutions have pros and cons.

Code: Select all

// Attributes.h
#include <tchar.h>
#include <windows.h>
#include "contentplug.h"

// Number of fields supported by this plug-in.
const DWORD FIELD_COUNT = 14;

// An array of size FIELD_COUNT containing the names of all file attributes covered 
// by GetFileAttributes system operation.
TCHAR* fieldNames [FIELD_COUNT] = {TEXT("Read Only"), TEXT("Hidden"), TEXT("System"), 
	TEXT("Directory"), TEXT("Archive"), TEXT("Device"), TEXT("Normal"), 
	TEXT("Temporary"), TEXT("Sparse File"), TEXT("Reparse Point"), TEXT("Compressed"), 
	TEXT("Offline"), TEXT("Not Content Indexed"), TEXT("Encrypted")};

// An array of size FIELD_COUNT containing the numeric constants of all file attributes 
// covered by GetFileAttributes system operation.
// The array index is used as field index.
DWORD attributeConstants [FIELD_COUNT] = {FILE_ATTRIBUTE_READONLY, FILE_ATTRIBUTE_HIDDEN, 
	FILE_ATTRIBUTE_SYSTEM, FILE_ATTRIBUTE_DIRECTORY, FILE_ATTRIBUTE_ARCHIVE, 
	FILE_ATTRIBUTE_DEVICE, FILE_ATTRIBUTE_NORMAL, FILE_ATTRIBUTE_TEMPORARY, 
	FILE_ATTRIBUTE_SPARSE_FILE, FILE_ATTRIBUTE_REPARSE_POINT, FILE_ATTRIBUTE_COMPRESSED,
	FILE_ATTRIBUTE_OFFLINE, FILE_ATTRIBUTE_NOT_CONTENT_INDEXED, FILE_ATTRIBUTE_ENCRYPTED};

int getSupportedField (int FieldIndex, TCHAR* FieldName, TCHAR* Units, int maxlen);
int getValue (TCHAR* FileName, int FieldIndex, void* FieldValue);;

Code: Select all

// Attributes.cpp
#include "Attributes.h"
#include <strsafe.h>

BOOL APIENTRY DllMain(HANDLE, DWORD, LPVOID)
{
    return TRUE;
}

int __stdcall ContentGetSupportedField(int FieldIndex, char* FieldName, char* Units, int maxlen)
{
#ifdef UNICODE
	return ft_nomorefields;
#else
	return getSupportedField (FieldIndex, FieldName, Units, maxlen);
#endif
}

#ifdef UNICODE
int __stdcall ContentGetSupportedFieldW (int FieldIndex, wchar_t* FieldName, wchar_t* Units, int maxlen)
{
	return getSupportedField (FieldIndex, FieldName, Units, maxlen);
}
#endif

int __stdcall ContentGetValue (char* FileName, int FieldIndex, int, void* FieldValue, int maxlen, int)
{	
#ifdef UNICODE
	return ft_nosuchfield;
#else
	return getValue (FileName, FieldIndex, FieldValue);
#endif
}

#ifdef UNICODE
int __stdcall ContentGetValueW (TCHAR* FileName, int FieldIndex, int, void* FieldValue, int, int)
{	
   return getValue (FileName, FieldIndex, FieldValue);
}
#endif

int getSupportedField (int FieldIndex, TCHAR* FieldName, TCHAR* Units, int maxlen)
{
	if (FieldIndex >= FIELD_COUNT)
	{
		return ft_nomorefields;
	}
	Units[0] = 0;
	StringCchCopy (FieldName, maxlen, fieldNames[FieldIndex]);
	return ft_boolean;
}

int getValue (TCHAR* FileName, int FieldIndex, void* FieldValue)
{
	DWORD attr = GetFileAttributes (FileName);	
	if (attr == INVALID_FILE_ATTRIBUTES)
	{
		return ft_fileerror;
	}	
	*(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];
	return ft_boolean;
}

fnheiden · Post by *fnheiden » 2007-09-30, 21:07 UTC

2ghisler(Author)
Great news! I'll definitely support it in anytag.wlx and anytag.wdx.

Post by *ghisler(Author) » 2007-10-01, 16:29 UTC

Will you also introduce a widechar character return type (maybe ft_widestring) in the content plug-in interface?

Yes, this is planned.

What about all plugins/utilities accessing INI, INC, BAR files? Do you intend to turn all these files to unicode too?

INI, INC and BAR files will remain in Ansi form, but Unicode strings will be stored as UTF-8 with UTF-8 byte order marker (BOM) in front of them. This is already working very well, and it's fully backwards compatible. If the file is already Unicode, I will probably store the Unicode strings directly. I haven't implemented that yet, though.

ContentGetSupportedFieldW

Currently Unicode field names are NOT planned. Field names must be in English anyway, the translation is done in a different way.

About the conversion:
1. Do not call the "W" functions in the Ansi function! They will not work under Windows 9x/ME.
2. You do not need to call the "A" functions in the "W" plugin function. Reason: TC will call the "A" functions under Windows 9x/ME even if there is a "W" function!

So your functions should look like this:

Code: Select all

int __stdcall ContentGetValue (char* FileName, int FieldIndex, int, void* FieldValue, int maxlen, int)
{   
   DWORD attr = GetFileAttributes (FileName);
   if (attr == INVALID_FILE_ATTRIBUTES)
   {
      return ft_fileerror;
   }   
   *(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];   
   return ft_boolean;   
}

int __stdcall ContentGetValueW (wchar_t* FileName, int FieldIndex, int, void* FieldValue, int, int)
{   
   DWORD attr = GetFileAttributesW (FileName);   
   if (attr == INVALID_FILE_ATTRIBUTES)
   {
      return ft_fileerror;
   }   
   *(BOOL*)FieldValue = attr & attributeConstants[FieldIndex];   
   return ft_boolean;   
}

Alternatively, you can use the implementation of Alextp if your ContentGetValue function is very complex. But usually it's easier to just copy+paste the function, and then change the calls to the "W" functions.

Lefteous · Post by *Lefteous » 2007-10-01, 16:55 UTC

Great to hear that there will be a widestring return type

INI, INC and BAR files will remain in Ansi form, but Unicode strings will be stored as UTF-8 with UTF-8 byte order marker (BOM) in front of them. This is already working very well, and it's fully backwards compatible. If the file is already Unicode, I will probably store the Unicode strings directly. I haven't implemented that yet, though.

How can this be compatible? How could an older TC version read such files?

Currently Unicode field names are NOT planned. Field names must be in English anyway, the translation is done in a different way.

Yes probably Unicode filed names aren't necessary.

Do not call the "W" functions in the Ansi function! They will not work under Windows 9x/ME.

Yes Alextp already wrote that. I dind't think of that initially. I'm not really familiar with the Unicode support on Windows 9x.

You do not need to call the "A" functions in the "W" plugin function. Reason: TC will call the "A" functions under Windows 9x/ME even if there is a "W" function!

I never did! The code above uses generic characters. It automatically picks the right function (A or W) depending on the #UNICODE preprocessor.

sually it's easier to just copy+paste the function, and then change the calls to the "W"
functions.

I'm not really enthusiastic about this. The first copy&paste and edit action may work ok but I have to do this for all future code which deals with strings. This is very error-prone.

Alextp · Post by *Alextp » 2007-10-01, 17:11 UTC

Ghisler wrote:About the conversion:
1. Do not call the "W" functions in the Ansi function! They will not work under Windows 9x/ME.
2. You do not need to call the "A" functions in the "W" plugin function.

Just a note:
Lefteous already has counted this, in his example with generic chars.
His variant is OK.

VadiMGP · Post by *VadiMGP » 2007-10-01, 21:20 UTC

ghisler(Author) wrote:INI, INC and BAR files will remain in Ansi form, but Unicode strings will be stored as UTF-8 with UTF-8 byte order marker (BOM) in front of them.

I think using UTF-8 encoding is the case when "the game is not worth the candle". Look at UTF-16, please.

1. Standard Set/GetPrivateProfilexxx functions can handle both UTF-16 and ANSI encoded ini files.

2. According to http://support.microsoft.com/kb/175392 there could be a problem with stored command line parameters, so usercmd.ini and BAR files will require internal translation into ANSI or UTF-16.

3. Using standard Set/GetPrivateProfilexxx functions also solve problem of backward compatibility.

Post by *ghisler(Author) » 2007-10-04, 15:22 UTC

Unfortunately using UTF16 does NOT solve backwards compatibililty. There are some big problems with it:
1. TC 7.02a and older loads the strings with GetPrivateProfileStringA, so the Unicode encodings are lost. When the user then saves the list, e.g. of the button bar or start menu, the Unicode strings are damaged.

2. When a usr uses dual boot for Windows 9x and Windows XP, then the UTF16 ini file will not work.

However, I plan to support UTF16 too if the the ini is already UTF16, so the user will have the option to convert it himself to UTF16 (e.g. via Notepad).

Lefteous · Post by *Lefteous » 2007-10-04, 16:25 UTC

2ghisler(Author)

Unfortunately using UTF16 does NOT solve backwards compatibililty.

Could you please explain how UTF-8 does?

gnozal8 · Post by *gnozal8 » 2007-10-05, 08:14 UTC

Lefteous wrote:2ghisler(Author)
Unfortunately using UTF16 does NOT solve backwards compatibililty.
Could you please explain how UTF-8 does?

IMHO,
For the first 127 characters, ANSI = UTF-8, and UTF-8 strings are zero terminated.
With UTF-16, you have strings with 'zero' bytes, and ANSI functions expecting SZ strings won't like this.

Lefteous · Post by *Lefteous » 2007-10-05, 08:20 UTC

2gnozal8

For the first 127 characters, ANSI = UTF-8

Well that's not enough to obtain full backwards compatibility.

VadiMGP · Post by *VadiMGP » 2007-10-06, 21:10 UTC

ghisler(Author) wrote:However, I plan to support UTF16 too if the the ini is already UTF16, so the user will have the option to convert it himself to UTF16 (e.g. via Notepad).

It is fine for me.
But I'm interesting how you work with UTF-8 files? You wrote you own functions for read/write these files?

Alextp · Post by *Alextp » 2007-10-06, 21:51 UTC

But I'm interesting how you work with UTF-8 files? You wrote you own functions for read/write these files?

It's simple

Code: Select all

var
  Ini: TIniFile;
  S: WideString;
...
  Ini:= TIniFile.Create('C:\Test.ini');
  S:= UTF8Decode(Ini.ReadString('Options', 'Param', ''));
  Ini.WriteString('Options', 'Param', UTF8Encode(S));
  //UTF8Decode/Encode are standard Delphi funcs