Page 1 of 1

Secure Copy using CRC, prevent Silent data corruption

Posted: 2023-01-02, 01:38 UTC
by isidro
I had several times silent data corruption (a bit flipped every 1Tb copied due to hardware failure): https://en.wikipedia.org/wiki/Data_corruption
Now my copy procedure is: genere CRC (ie BLAKE3), copy, verify CRC, (sometimes also against source).
That is VERY time consuming on large (ie: 4Tb) drives.
Basic backup takes 18 hours using two 4Tb drives:
1) generate BLAKE3 CRC file for all data (6h)
2) copy source drive to backup drive (6h, read and write is done simultaneously)
3) verify backup against CRC file (6h)

It's an essential missing function to allow copying and generating CRC simultanously (which would reduce steps 1 and 2 to 6 hours) , and ideally to also have a checkbox to do the verification against it, so everything's done without user intervention...

Re: Secure Copy using CRC, prevent Silent data corruption

Posted: 2023-01-02, 02:16 UTC
by petermad
It's an essential missing function to allow copying and generating CRC simultanously
Have you tried enabling the "Verify" option in the Copy dialog? - https://madsenworld.dk/tcmd/verifycopy.png

Help wrote:With the option Verify enabled, Total Commander reads the copied file again after copying finishes, and compares its MD5 checksum with the original. The disk cache will be bypassed.
MD5 is probably slower than BLAKE, but you should be able to do it in one take.

Also notice:
History.txt wrote:24.03.17 Fixed: Verify after copy: retry up to 3 times to open target file if it fails (e.g. because writing is still not done) (32/64)

Re: Secure Copy using CRC, prevent Silent data corruption

Posted: 2023-01-02, 08:30 UTC
by abouchez
Some remarks:

1) To find a bit flip, a CRC32C is enough, and can be done at huge speed. This was the purpose of those CRC algorithms. No need of a cryptographic hash.

2) Don't hash the whole file, but hash the file by chunks (e.g. 32MB) during the read process itself. Write the chunk, then read and crc the chunk.

3) Bypassing the disk cache is somewhat not easy to do. Even if you do it at OS level, it is very likely that the HW level may return it from its cache, not from the actual HW. The only way I think of is write the whole huge destination file with FILE_FLAG_WRITE_THROUGH, then close the file, then reopen it to check the crc.

4) So I would hash file chunks, then compute the chunk crc and keep it in memory or disk, then, once the whole file is copied, read back the chunks and verify the CRC: if one fails, only replaces the chunk, not the whole file. It is much more time efficient.

5) My guess for such huge file is that a "recover" function may be needed. That is, if the copy is interrupted, make a new copy start from where it was aborted (i.e. at last whole chunk level).

6) There are already some tool doing it very efficiently, e.g. rsync - so is it worth reinventing the wheel?