[Bug] Incorrect comparison of two unicode strings

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
remittor
Junior Member
Junior Member
Posts: 49
Joined: 2019-10-02, 07:18 UTC

[Bug] Incorrect comparison of two unicode strings

Post by *remittor »

TotalCmd 9.22a uses this function absolutely everywhere:

Code: Select all

int __usercall wcsicmp@<eax>(WCHAR * wstr1@<eax>, WCHAR * wstr2@<edx>)
{
  int result;
  CHAR tmp2[260];
  CHAR tmp1[260];
  WCHAR wtmp2[1024];
  WCHAR wtmp1[1024];

  if ( win_dwPlatformId == 2 )         // Windows NT
  {
    wcsncpyEx(wtmp1, 1023, wstr1);
    wcsncpyEx(wtmp2, 1023, wstr2);
    wstr_lower(wtmp1);
    wstr_lower(wtmp2);
    result = wcscmp(wtmp1, wtmp2);
  }
  else
  {
    WideCharToMultiByte(0, 0, wstr1, -1, tmp1, 259, 0, 0);
    WideCharToMultiByte(0, 0, wstr2, -1, tmp2, 259, 0, 0);
    str_lower(tmp1);
    str_lower(tmp2);
    result = strcmp(tmp1, tmp2);
  }
  return result;
}
Critical Bug:
If the length of wstr1 or wstr2 is will be more than 259, then WideCharToMultiByte not insert null terminator!
After function strcmp will compare trash!

Recomendation: All this code needs to be replaced with a call shlwapi.dll@StrCmpIW
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48005
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [Bug] Incorrect comparison of two unicode strings

Post by *ghisler(Author) »

This shouldn't be a problem because the part with WideCharToMultiByte is only used on Windows 9x/ME, which doesn't support such long paths. However, I can make tmp1 and tmp2 larger an add a terminating 0 myself just in case.
Author of Total Commander
https://www.ghisler.com
remittor
Junior Member
Junior Member
Posts: 49
Joined: 2019-10-02, 07:18 UTC

Re: [Bug] Incorrect comparison of two unicode strings

Post by *remittor »

ghisler(Author) wrote: 2019-12-16, 11:30 UTC This shouldn't be a problem because the part with WideCharToMultiByte is only used on Windows 9x/ME
It seems so to you. But in fact, long strings come into function wcsicmp.
For example, the function wcsicmp is called where the display of the internal directory of the archive file is formed. And this function is called in many places.

Code: Select all

wcsicmp(&elem[40], L"plugininst.inf")   /* elem[40] = headerData.FileName */
wcsicmp(locFileName, ArcCurDir)           /* WCHAR locFileName[1024] , ArcCurDir = current archive directory */
wcsicmp(ArcCurDir, filename)              /* filename = locFileName */
ghisler(Author) wrote: 2019-12-16, 11:30 UTC However, I can make tmp1 and tmp2 larger an add a terminating 0 myself just in case.
This is terrible! Do not do that! Just call the function shlwapi.dll@StrCmpIW (since Win98 with IE 4.0).

The function wcsicmp you implemented is slow. This function also introduces its delay when fetching files to display the insides of archives.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48005
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [Bug] Incorrect comparison of two unicode strings

Post by *ghisler(Author) »

Sorry, I will not change to a so far unknown function just for making TC on legacy Windows 9x/ME a little bit faster. I will add the terminating zeroes myself, although MSDN describes that they are actually added:
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte
cchWideChar
...
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
My function does pass -1 as cchWideChar.
Author of Total Commander
https://www.ghisler.com
remittor
Junior Member
Junior Member
Posts: 49
Joined: 2019-10-02, 07:18 UTC

Re: [Bug] Incorrect comparison of two unicode strings

Post by *remittor »

ghisler(Author) wrote: 2019-12-18, 10:27 UTC... although MSDN describes that they are actually added...
This is true only when output string is less than 259 characters long.
Personally test and understand what it is about.
ghisler(Author) wrote: 2019-12-18, 10:27 UTCSorry, I will not change to a so far unknown function
You’ve said the same thing about these functions: StrCmpLogicalW, SHStrDupW. But now use them too.
remittor
Junior Member
Junior Member
Posts: 49
Joined: 2019-10-02, 07:18 UTC

Re: [Bug] Incorrect comparison of two unicode strings

Post by *remittor »

Regarding the use of function StrCmpIW, I already wrote here: https://www.ghisler.ch/board/viewtopic.php?p=367822#p367822
ghisler(Author) wrote: 2020-01-13, 10:44 UTC I'm reluctant to use StrCmpIW because I don't know what it does internally.
I created a special test program that compares the function StrCmpIW and CharLowerW.
test program sources

Code: Select all

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>
#include <stdlib.h>
#include <stdio.h>
#include <shlwapi.h>

const int str_len = 2;
const WCHAR filename[] = L"test_result.txt";
const WCHAR bom = L'\xFEFF';

int main(int argc, WCHAR* argv[])
{
  DWORD dw;
  WCHAR str[str_len + 16];
  WCHAR lwr[str_len + 16];

  DeleteFileW(filename);
  DWORD dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE;
  HANDLE hFile = CreateFileW(filename, GENERIC_WRITE, dwShareMode, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
  for (size_t i = 0; i < 8; i++) {
    str[i] = 0x20;
  }
  str[0] = bom;
  str[7] = L'\n';
  str[8] = 0;
  WriteFile(hFile, str, (DWORD)wcslen(str) * sizeof(WCHAR), &dw, NULL);

  memset(str, 0, sizeof(str));
  str[0] = 0x20;
  str[1] = 0x20;
  printf("start \n");

  for (size_t t = 0xD800 - 1; t <= 0xDFFF; t++) {
    str[0] = (WCHAR)t;
    if (t == 0xD800 - 1) {
      str[0] = 0x20;
    }
    for (size_t x = 0; x <= 0xFFFF; x++) {
      if (str[0] == 0x20) {
        if (x == 0) continue;  /* ignore null */
      }
      str[1] = (WCHAR)x;
      memcpy(lwr, str, str_len * sizeof(WCHAR));
      lwr[2] = 0;
      CharLowerW(lwr);
      int ret = StrCmpIW(lwr, str);
      if (ret) {
        WCHAR buf[str_len * 2 + 64];
        wcscpy(buf, str);
        wcscat(buf, L" / ");
        wcscat(buf, lwr);
        wcscat(buf, L"\n");
        WriteFile(hFile, buf, (DWORD)wcslen(buf) * sizeof(WCHAR), &dw, NULL);
      }
    }
  }
  wcscpy(str, L"<<<EOF>>>");
  wprintf(str);
  WriteFile(hFile, str, (DWORD)wcslen(str) * sizeof(WCHAR), &dw, NULL);
  return 0;
}
According to the test results, it turned out that only "Ʀ"/"ʀ" is the problem symbol.
Unicode:
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48005
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [Bug] Incorrect comparison of two unicode strings

Post by *ghisler(Author) »

In my tests on Windows 10, StrCmpIW didn't cause any measurable speedup of the archive read functions (at least of the initial read).
Author of Total Commander
https://www.ghisler.com
Post Reply