Bug: CRC files are created with system codepage

saelic · Post by *saelic » 2014-07-25, 20:21 UTC

Operating system
Windows 8 / Windows XP

TC version
Total Commander 8.51a RU/PL

Problem
TC saves CRC files with codepage set to system default. This means, that when file is created under system with some codepage and it contains codepage-specific characters in name/path, then we are unable to validate CRC with this file on system with different codepage.

Solution
CRC file should always be saved as UTF-8.

Severity
Medium. But I do not have any doubt that this bug should be quickly corrected.

Example
Create CRC file under codepage Windows-1250 with characters like "ł", "ę", "ą" (polish diacritic) in file name/path.
Then try to validate CRC with output file under system with codepage Windows-1251 (Russian). You will end up with "File not found" error.

Moreover, TC log gets funny entry:

Code: Select all

2014-07-25 20:05:32: Проверка архива(Ошибка: Не найдено): projekt2\ іukasz

(in English: "Searching archive(Error: Not found): projekt2\ іukasz")

In this example, file was named as projekt2\łukasz, CRC was created under polish Windows and checked under russian Windows. Log entry is from russian Windows too.

Other thoughts
I'm worried if such encoding problems doesn't affect other TC functionality, for example, file comparisoning.

gdpr deleted 6 · Post by *gdpr deleted 6 » 2014-07-26, 14:05 UTC

I tested 32-bit and 64-bit versions of TC 8.51a on an english Windows 7 Pro x64.

Creating an SHA256 file for a file with Japanese characters (like バウンドし) results in a UTF-8 encoded checksum file (with a BOM).

However, neither Polish characters like ł or german umlauts like ü in a file name resulted in an UTF-8 encoded checksum file.

It seems that TC first tries to use whatever (local system?) code page for encoding. Only if that fails (as would happen for Japanese characters on my machine, i guess) TC seems to fall back to UTF-8 encoding.

The reason why TC does this is probably that many checksumming tools in the wild wild world are not capable of reading UTF or Unicode encoded checksum files. Encoding checksum files by default as UTF-8 would thus create another heap of problems...

Still it would perhaps be nice to offer a option/INI file setting to enforce UTF-8 encoding.

Post by *ghisler(Author) » 2014-07-28, 09:50 UTC

Yes, this is intentional - CRC files use the current system ANSI encoding, unless the characters do not belong to the current codepage.