[Bug] Incorrect comparison of two unicode strings

remittor · Post by *remittor » 2019-12-13, 12:24 UTC

TotalCmd 9.22a uses this function absolutely everywhere:

int __usercall wcsicmp@<eax>(WCHAR * wstr1@<eax>, WCHAR * wstr2@<edx>)
{
  int result;
  CHAR tmp2[260];
  CHAR tmp1[260];
  WCHAR wtmp2[1024];
  WCHAR wtmp1[1024];

  if ( win_dwPlatformId == 2 )         // Windows NT
  {
    wcsncpyEx(wtmp1, 1023, wstr1);
    wcsncpyEx(wtmp2, 1023, wstr2);
    wstr_lower(wtmp1);
    wstr_lower(wtmp2);
    result = wcscmp(wtmp1, wtmp2);
  }
  else
  {
    WideCharToMultiByte(0, 0, wstr1, -1, tmp1, 259, 0, 0);
    WideCharToMultiByte(0, 0, wstr2, -1, tmp2, 259, 0, 0);
    str_lower(tmp1);
    str_lower(tmp2);
    result = strcmp(tmp1, tmp2);
  }
  return result;
}

Critical Bug:
If the length of wstr1 or wstr2 is will be more than 259, then WideCharToMultiByte not insert null terminator!
After function strcmp will compare trash!

Recomendation: All this code needs to be replaced with a call shlwapi.dll@StrCmpIW

Post by *ghisler(Author) » 2019-12-16, 11:30 UTC

This shouldn't be a problem because the part with WideCharToMultiByte is only used on Windows 9x/ME, which doesn't support such long paths. However, I can make tmp1 and tmp2 larger an add a terminating 0 myself just in case.

remittor · Post by *remittor » 2019-12-17, 12:49 UTC

ghisler(Author) wrote: ↑2019-12-16, 11:30 UTC This shouldn't be a problem because the part with WideCharToMultiByte is only used on Windows 9x/ME

It seems so to you. But in fact, long strings come into function wcsicmp.
For example, the function wcsicmp is called where the display of the internal directory of the archive file is formed. And this function is called in many places.

Code: Select all

wcsicmp(&elem[40], L"plugininst.inf")   /* elem[40] = headerData.FileName */
wcsicmp(locFileName, ArcCurDir)           /* WCHAR locFileName[1024] , ArcCurDir = current archive directory */
wcsicmp(ArcCurDir, filename)              /* filename = locFileName */

ghisler(Author) wrote: ↑2019-12-16, 11:30 UTC However, I can make tmp1 and tmp2 larger an add a terminating 0 myself just in case.

This is terrible! Do not do that! Just call the function shlwapi.dll@StrCmpIW (since Win98 with IE 4.0).

The function wcsicmp you implemented is slow. This function also introduces its delay when fetching files to display the insides of archives.

Post by *ghisler(Author) » 2019-12-18, 10:27 UTC

Sorry, I will not change to a so far unknown function just for making TC on legacy Windows 9x/ME a little bit faster. I will add the terminating zeroes myself, although MSDN describes that they are actually added:
https://docs.microsoft.com/en-us/windows/win32/api/stringapiset/nf-stringapiset-widechartomultibyte

cchWideChar
...
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.

My function does pass -1 as cchWideChar.

remittor · Post by *remittor » 2019-12-18, 13:48 UTC

ghisler(Author) wrote: ↑2019-12-18, 10:27 UTC... although MSDN describes that they are actually added...

This is true only when output string is less than 259 characters long.
Personally test and understand what it is about.

ghisler(Author) wrote: ↑2019-12-18, 10:27 UTCSorry, I will not change to a so far unknown function

You’ve said the same thing about these functions: StrCmpLogicalW, SHStrDupW. But now use them too.

remittor · Post by *remittor » 2020-01-24, 09:39 UTC

Regarding the use of function StrCmpIW, I already wrote here: https://www.ghisler.ch/board/viewtopic.php?p=367822#p367822

ghisler(Author) wrote: ↑2020-01-13, 10:44 UTC I'm reluctant to use StrCmpIW because I don't know what it does internally.

I created a special test program that compares the function StrCmpIW and CharLowerW.

test program sources

Code: Select all

#define _CRT_SECURE_NO_WARNINGS

#include <windows.h>
#include <stdlib.h>
#include <stdio.h>
#include <shlwapi.h>

const int str_len = 2;
const WCHAR filename[] = L"test_result.txt";
const WCHAR bom = L'\xFEFF';

int main(int argc, WCHAR* argv[])
{
  DWORD dw;
  WCHAR str[str_len + 16];
  WCHAR lwr[str_len + 16];

  DeleteFileW(filename);
  DWORD dwShareMode = FILE_SHARE_READ | FILE_SHARE_WRITE;
  HANDLE hFile = CreateFileW(filename, GENERIC_WRITE, dwShareMode, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
  for (size_t i = 0; i < 8; i++) {
    str[i] = 0x20;
  }
  str[0] = bom;
  str[7] = L'\n';
  str[8] = 0;
  WriteFile(hFile, str, (DWORD)wcslen(str) * sizeof(WCHAR), &dw, NULL);

  memset(str, 0, sizeof(str));
  str[0] = 0x20;
  str[1] = 0x20;
  printf("start \n");

  for (size_t t = 0xD800 - 1; t <= 0xDFFF; t++) {
    str[0] = (WCHAR)t;
    if (t == 0xD800 - 1) {
      str[0] = 0x20;
    }
    for (size_t x = 0; x <= 0xFFFF; x++) {
      if (str[0] == 0x20) {
        if (x == 0) continue;  /* ignore null */
      }
      str[1] = (WCHAR)x;
      memcpy(lwr, str, str_len * sizeof(WCHAR));
      lwr[2] = 0;
      CharLowerW(lwr);
      int ret = StrCmpIW(lwr, str);
      if (ret) {
        WCHAR buf[str_len * 2 + 64];
        wcscpy(buf, str);
        wcscat(buf, L" / ");
        wcscat(buf, lwr);
        wcscat(buf, L"\n");
        WriteFile(hFile, buf, (DWORD)wcslen(buf) * sizeof(WCHAR), &dw, NULL);
      }
    }
  }
  wcscpy(str, L"<<<EOF>>>");
  wprintf(str);
  WriteFile(hFile, str, (DWORD)wcslen(str) * sizeof(WCHAR), &dw, NULL);
  return 0;
}

According to the test results, it turned out that only "Ʀ"/"ʀ" is the problem symbol.
Unicode:

Post by *ghisler(Author) » 2020-01-24, 17:52 UTC

In my tests on Windows 10, StrCmpIW didn't cause any measurable speedup of the archive read functions (at least of the initial read).

Total Commander

[Bug] Incorrect comparison of two unicode strings

[Bug] Incorrect comparison of two unicode strings

Re: [Bug] Incorrect comparison of two unicode strings

Re: [Bug] Incorrect comparison of two unicode strings

Re: [Bug] Incorrect comparison of two unicode strings

Re: [Bug] Incorrect comparison of two unicode strings

Re: [Bug] Incorrect comparison of two unicode strings

Re: [Bug] Incorrect comparison of two unicode strings