Verify after copy

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
gigaman
Member
Member
Posts: 131
Joined: 2003-02-14, 11:28 UTC

Post by *gigaman »

knnknn wrote:How in the world can anyone be AGAINST VERIFYING?
Because those few that would use it - would think "great, now everything is verified" - while it's not, it guarantees nothing.

HolgerK was right when he asked about caching; the target file would certainly have to be opened and written to without caching, and I'm not sure if it's a sufficient condition. Once you write to a file with caching, there's no way to tell windows to "forget" this data.
You may flush the cache - but it only flushes the write cache; when you read from this file again, this read will go from the cache (memory) anyway, not from the real file (and there's also the hardware cache here, isn't it?)

Sure, if you copy a few gigabytes and then read them again, they will most likely not be in cache because it's not that big... but it makes the whole operation inconsistent/unreliable.
User avatar
Fuzbolero
Junior Member
Junior Member
Posts: 20
Joined: 2007-06-08, 12:42 UTC
Location: Europe
Contact:

Post by *Fuzbolero »

Regardless of moving or copying, my assumption is that the first checksum can be calculated on the initial read operation with "sufficient" reliability.

Not sure if I am missing something here, but it seems that some here indicate that it is not sufficient to base the first checksum for a verification on the initial file read?

What is wrong with that?
Cannot the verification be done against a checksum calculated on the initial read operation? The source file will not change during this operation, so why read it again for the verification?

Some of the arguments here seem to be mostly about reliability and avoiding false secure feeling. Isnt it at least "better", a bit more secure, with a verification than without one?

Even if it was possible, we probably dont need 100% security for the copy operations, but better than nothing, as convenient and time saving as possible.

Needs for highly secure solutions can be covered with existing professional tools. TC could cover the more basic needs without claiming to be more than that.
Twitter.com/@FuzboleroXV
User avatar
ado
Senior Member
Senior Member
Posts: 445
Joined: 2003-02-18, 13:22 UTC
Location: Slovakia, Pezinok

Post by *ado »

Fuzbolero,
there is no problem with calculating of checksum of file while you read it. The problem is how to make sure, that file that you wrote to target location has the same check sum. On that side the file can be in the cache, so after your copy process is completed and you are going to read target file, you may get it from cache and not from physical media

ado
User avatar
Fuzbolero
Junior Member
Junior Member
Posts: 20
Joined: 2007-06-08, 12:42 UTC
Location: Europe
Contact:

Post by *Fuzbolero »

Yes, I understand that part regarding the cache on the target drive.

Some here in this thread claims that the source must be re-read _after_ the copy process for the checksum calculation. I have also seen in another thread about something that makes it "impossible" to simultaneously copy and generate checksum on the initial source file read, and I wonder what kind of limitation that may be. If I remember correctly, it seemed to be a side effect of the development tool used to make TC, but not sure about that. Not sure if those two are related, maybe it is because of such a development tool limitation side-effect that some claim that the source must be re-read. Just curious to understand why, as it seems strange if one cannot work around it somehow.
Twitter.com/@FuzboleroXV
User avatar
JohnFredC
Power Member
Power Member
Posts: 886
Joined: 2003-03-14, 13:37 UTC
Location: Sarasota Florida

Post by *JohnFredC »

These issues are similar to those addressed by the "Delay Tolerant Networking" initiative. Perhaps a paradigm for assured copying can be found in that effort.

Also, see here.
Licensed, Mouse-Centric, moving (slowly) toward Touch-centric
User avatar
hlloyge
Member
Member
Posts: 131
Joined: 2006-11-02, 23:14 UTC

Post by *hlloyge »

I just beg you all not to go ahead discovering hot water, and to think about basics of TCP/IP and Reed-Solomon coding in HDDs at a low level.

Here is an explanation oh how it basically works (very simplified and easy to understand):
http://www.pcguide.com/ref/hdd/geom/error_ECC.htm

And as I said before, each and every error during copying files WILL be noted, and they occur due to hardware malfunction. No checksum checking will save you from that.
User avatar
solid
Power Member
Power Member
Posts: 747
Joined: 2004-08-09, 11:20 UTC

Post by *solid »

Maybe this tool can help. It has verify with MD5 or SHA-1.
knnknn
Junior Member
Junior Member
Posts: 60
Joined: 2007-07-20, 08:04 UTC

Post by *knnknn »

hlloyge wrote:And as I said before, each and every error during copying files WILL be noted
Nonsense.
hlloyge wrote:and they occur due to hardware malfunction. No checksum checking will save you from that.
CRC checking CAN help you noticing hardware malfunctions, e.g. harddrives that start to break, cables that start to corrode, USB sticks that don't sit well etc.

Noone claims that CRC checking is perfect. But please don't claim that additional security is unnecessary.

Moreover Total Commander already _HAS_ CRC functionality. It just needs to implement it (optionally of course) automatically after copying/moving.
User avatar
byblo
Senior Member
Senior Member
Posts: 270
Joined: 2005-02-20, 21:13 UTC
Contact:

Post by *byblo »

Thank you knnknn. I was about to request the same feature for TC since long time ago. (or maybe I did ? I don't remember...)

I am astonished about all negative reactions about it...

I am also dreaming such feature for TC, but since ghisler did not already participated on this topic, I feel it bad.


In my sense, a good procedure for secure copying should be (which at this point, is theorical only, I'm not talking here about technical specs and limitations like if ignoring cache is feasable or not yet):

- user click the single checkbox to enable the secured copy method.

- file is copied normally, using regular method (cache or not, OS copy, whatever...)

- IF a crc32/md5 for the copied files was already declared somewhere (sfv/md5 file or in the tag's name), get the declared value for later use and thanks to it, decide if further checks will be done by crc32 or md5 (crc32 by default since its faster)

- calculate a crc32 OR md5 from the source file WITHOUT using cache when possible.

- If declared value existing, compare already declared crc against current source crc, then alert user if do not match.

- calculate a crc32 OR md5 from the destination file WITHOUT using cache when possible.

- alert user if crc's from source and destination do not match perfectly.

I think this should be a minimal...



And why such feature ? Let me share you my (bad) experience about it:

Months ago, I was copying files from a backup usb device destined to be formatted a bit later - a 40GB HD i guess remember - then decided to check their crc32 after the whole drive copy.
Since some years - after finding that some of my files were corrupted for remaining unknown reasons - I try to always add a crc32 to my files, then I noticed after the copy that a file wasn't matching anymore its declared crc32 value.
TC did not reported any copy error message! (for the little story, my new MB wasn't powering correctly the USB port, needing to add an second USB plug to work correctly. whatever)

That incident made me really nervous, since I avoided the lose of that file ONLY because I wasn't enough lazy and impatient to test the files byte per bytes after the drive's copy content.


Also, as you know, modern HD can reach 2TB, which means that the TC's sync tool is totally useless for instance for monthly incremental backups:
Thus, how do you check byte per byte lets say 1000 files just added in a backup HD already full at 85% without checking the whole drive each time?


It seems that fastcopy propose something in that spirit:
Verify written files data by MD5(or SHA-1. If you want to use SHA-1, write [main] Using_MD5=0 in fastcopy.ini)
... Action detail: Read(Src) -> MD5(Src) -> Write(Dst) -> Read(Dst) -> MD5(Dst) -> Compare MD5(Src/Dst) (Of course, all actions are processed in parallel as much as possible)
source: http://www.ipmsg.org/tools/fastcopy.html.en

But as you can see, it lacking some needs, and that "(Of course, all actions are processed in parallel as much as possible)" line, makes me nervous again, since it is ambiguous (and im too lazy to check further :p).


Also, I would prefer to see that feature directly in TC, since i don't use fastcopy or TC's external copying programs.

It should not be hard to implement it, and offering easily a new USEFUL feature for TC, which is copying more safely when needed :)
knnknn
Junior Member
Junior Member
Posts: 60
Joined: 2007-07-20, 08:04 UTC

Post by *knnknn »

byblo wrote:- IF a crc32/md5 for the copied files was already declared somewhere (sfv/md5 file or in the tag's name), get the declared value for later use and thanks to it, decide if further checks will be done by crc32 or md5 (crc32 by default since its faster)
Yes, a centrally stored md5 file would be ideal.

Even better: A centrally stored .ecc file so corrupt files could be repaired (like with QuickPar). That would be even better.

But first things first: I would be happy if TC would simply start by implementing copy verifying.
User avatar
hlloyge
Member
Member
Posts: 131
Joined: 2006-11-02, 23:14 UTC

Post by *hlloyge »

knnknn wrote:Nonsense.
Yeah-sense. When dealing with hundreds of gigabytes, and going to tens of terabytes each month, soon you find out what really happens during the copy process. Files doesn't get lost while copying EXCEPT when you copy with tools that skip files that are for some reason unreadable (bad sector, NTFS or other FS rights). TC doesn't by default do that.
CRC checking CAN help you noticing hardware malfunctions, e.g. harddrives that start to break, cables that start to corrode, USB sticks that don't sit well etc.
True, but they can't help you while copying. You need to make CRC files at the source directory, copy those files with CRC files, and then, afterwards, check their integrity if you want, once a week, once a month, whenever you want - but they won't help you copy a file.
Noone claims that CRC checking is perfect. But please don't claim that additional security is unnecessary.
While copying, yes, it is very unnecessary. I wrote already why, and I guess you didn't understand why is that. I am sorry, but I don't really have the time explaining to you how TCP/IP works and how ReedSolomon coding works, and why I am right, and you are not. If you are really interested in that, I've sent the link to very good site explaining how things in computers work.

Moreover, if you copy data from source to destination, and you don't have CRC checksums created from when the data is created, noone can assure that the data is intact; you can create CRC data while copying that files, but they can be corrupt from the start. How on earth would copying and creating checksums help you? It would only slow down copy process, and without any gain.

Checksums are already sent in TCP packets over the network. They are more than enough to ensure the transfer of data. You can always make checksum data for files (I make them for my CD backups), so you can check at any time if the data is intact.
knnknn
Junior Member
Junior Member
Posts: 60
Joined: 2007-07-20, 08:04 UTC

Post by *knnknn »

hlloyge wrote:While copying, yes, it is very unnecessary. I wrote already why, and I guess you didn't understand why is that. I am sorry, but I don't really have the time explaining to you how TCP/IP works and how ReedSolomon coding works, and why I am right, and you are not.
Please spare us your funny theories why you think that copy errors are always caught.

You yourself obviously don't actually USE VERIFICATION when copying over network, because then you wouldn't state such dangerous nonsense.

Just last week I discovered that an overloaded PCI bus caused the LAN card to act strangely. The files DID ARRIVE CORRUPT at the destination harddrive. I had to put the LAN card into another PCI slot to fix this behavior.

And from my _ACTUAL EXPERIENCE_ with "always-verify-after-copy" (as opposed to your _THEORIES_) I know that the likability of CRC errors increases

* the older (or lower quality) the cables are
* the more connectors you add (e.g. connect 2 USB cables with 1 connector)
* the more adapter you use (e.g. IDE HDD -> IDE-to-SATA-adapter -> SATA-to-USB-adapter)

I am glad that FastCopy (which automatically checks for CRC errors) caught the corrupting LAN card. And, no, Total Commander would not have noticed, because the copying itself went quietly well. Just the re-reading from the destination harddrive (= which FastCopy does automatically) caught it.
User avatar
hlloyge
Member
Member
Posts: 131
Joined: 2006-11-02, 23:14 UTC

Post by *hlloyge »

I am sorry you have hardware problems. In the real world, the possibility for something like this to happen is extremely low, and I have to say you have a lot of bad luck.
Are you running your PC overclocked?
User avatar
sqgl
Junior Member
Junior Member
Posts: 9
Joined: 2010-03-30, 21:40 UTC
Location: Australia
Contact:

Post by *sqgl »

knnknn wrote:
hlloyge wrote:And as I said before, each and every error during copying files WILL be noted
Nonsense.
hlloyge wrote:and they occur due to hardware malfunction. No checksum checking will save you from that.
CRC checking CAN help you noticing hardware malfunctions, e.g. harddrives that start to break, cables that start to corrode, USB sticks that don't sit well etc.

Noone claims that CRC checking is perfect. But please don't claim that additional security is unnecessary.

Moreover Total Commander already _HAS_ CRC functionality. It just needs to implement it (optionally of course) automatically after copying/moving.
I'm with knnknn, the others are just adding noise in a pissing contest so as to show how they are soooo professional and only work with professional equipment. Stop dissing my work environment!

TC already has the synchronise function. Why do you suppose that is? It is because knnknn (and Ghisler!) are right.

For now I recommend to knnknn to memorise the hotkey shortcut sequence (ALT-CYCY, pretty easy to remember right?). All we are asking (in effect) is for is a similar sync function that deletes the originals if the copy is succesful, and one that does not require a long shortcut key sequence even. Auto verifying could be selected in the configuration options.
User avatar
HolgerK
Power Member
Power Member
Posts: 5406
Joined: 2006-01-26, 22:15 UTC
Location: Europe, Aachen

Post by *HolgerK »

sqgl wrote:TC already has the synchronise function. Why do you suppose that is?
Because an edited file can have same size but different content and timestamps you can't trust on?

Regards
Holger
Post Reply