Option to generate checksum files as UTF-8 without BOM
Moderators: white, Hacker, petermad, Stefan2
Option to generate checksum files as UTF-8 without BOM
I've noticed that, even if it's selected "Unix format: line breaks, '/' in paths", it will still use UTF-8-BOM as file encoding, which doesn't work with "md5sum -c sums.md5". TC only modifies the EOL but not the other, and both are required. Would like to have an additional option for that, so i don't have to use later dos2unix or Notepad++ on checksum files to modify the encoding.
- ghisler(Author)
- Site Admin
- Posts: 48124
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: Option to generate checksum files as UTF-8 without BOM
I cannot reproduce that, sorry.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: Option to generate checksum files as UTF-8 without BOM
Here is how to reproduce it:
Select any file -> Files -> Create checksum file -> save in unix format -> checksum.md5 -> ok
Now open checksum.md5 with Notepad++ -> go to Encoding -> it displays UTF-8-BOM, which is incompatible with md5sum for linux.
You can check it with GNU Bash also:
Code: Select all
$(cat checksum.md5 | head -c3 | grep -q $'\xef\xbb\xbf') && echo yes || echo no
Re: Option to generate checksum files as UTF-8 without BOM
No confirmed.JardaSX wrote: ↑2020-04-07, 15:11 UTCHere is how to reproduce it:
Select any file -> Files -> Create checksum file -> save in unix format -> checksum.md5 -> ok
Now open checksum.md5 with Notepad++ -> go to Encoding -> it displays UTF-8-BOM, which is incompatible with md5sum for linux.
You can check it with GNU Bash also:Code: Select all
$(cat checksum.md5 | head -c3 | grep -q $'\xef\xbb\xbf') && echo yes || echo no
The created md5 file is definitely not UTF-8 and has no BOM.
Its a standard Unix file.
There is no special tool necessary to check file format,
any good editor shows it and also allows to see it in Hex.
Windows 11 Home x64 Version 23H2 (OS Build 22631.3527)
TC 11.03 x64 / x86
Everything 1.5.0.1375a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.78
QAP 11.6.3.3 x64
TC 11.03 x64 / x86
Everything 1.5.0.1375a (x64), Everything Toolbar 1.3.3, Listary Pro 6.3.0.78
QAP 11.6.3.3 x64
Re: Option to generate checksum files as UTF-8 without BOM
I can only reproduce that in TC 9.50: If the characters in the filenames are outside of the current (ANSI) codepage, TC apparently adds a UTF-8 BOM to the checksum file. TC 9.51 doesn't add a BOM, and it doesn't even add it when the option "Always use UTF-8 in names" is also selected.
That is apparently the relevant change:
Which TC version exactly did you test with? Please also keep in mind that the filenames and the current OS codepage may be important for such tests as well.
Regards
Dalai
That is apparently the relevant change:
2JardaSXTC's history.txt wrote:12.02.20 Release Total Commander 9.50a release candidate 1 (RC1)
[...]
09.02.20 Fixed: Create CRC checksums: Do not add UTF-8 byte order marker to beginning of checksum file when using "Unix format" (32/64)
[...]
05.02.20 Release Total Commander 9.50 final (32/64)
Which TC version exactly did you test with? Please also keep in mind that the filenames and the current OS codepage may be important for such tests as well.
Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64
Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
Re: Option to generate checksum files as UTF-8 without BOM
Ok I've detected where the problem is. I't caused by Total Commander 9.50 (2020-02-05), with both TOTALCMD.EXE and TOTALCMD64.EXE. However I've been using TC since a long time, and probably it was some incremental update which caused the issue. Updating again to latest versoin without a clean install fixed the issue, with 9.51 it generates checksums with UTF-8 no BOM. I still have the backup of the C:\totalcmd\ in case the developer wants to take a look at it.
Re: Option to generate checksum files as UTF-8 without BOM
Ok I have even more information. Performed a clean install of Total Commander 9.50. The issue is caused by selecting "Always use UTF-8 in names", regardless of the value of "Linux format: line breaks, '/' in paths".
Updating as update (no clean install) with tcmd951x32_64.exe fixes that issue. Obviously that's a bug which has been fixed, intentionally or not.
In any case, I find odd that the Microsoft documentation establishes that UTF-8 uses byte order mark "EF BB BF", so they imply that UTF-8 is just UTF-8-BOM for them (without having to make it explicit): Using Byte Order Marks
Those who have answered please re-test if you can reproduce the issue.
Updating as update (no clean install) with tcmd951x32_64.exe fixes that issue. Obviously that's a bug which has been fixed, intentionally or not.
In any case, I find odd that the Microsoft documentation establishes that UTF-8 uses byte order mark "EF BB BF", so they imply that UTF-8 is just UTF-8-BOM for them (without having to make it explicit): Using Byte Order Marks
Those who have answered please re-test if you can reproduce the issue.