Sync Directories - compare by file size and content
Posted: 2023-01-07, 11:11 UTC
I often use Synchronize Directories with the [by content] and [ignore date] options selected for performing the comparisons. Works well in most cases but not when the file names in one or both of the two directories have been altered. Such alteration could have arisen through deliberate tweaking over a long period and I am now comparing an old backup with the current version of the folder. It could also arise when I am comparing a copy of a camera card (new or old) with files that were renamed by Lightroom or Downloader Pro while importing a copy of the image files onto my computer.
Whatever the cause for name differences, files can only ever be an exact match if they have exactly the same size. So if we had an option to [ignore name] as well as the existing [ignore date] option in Synchronize Folders, then TC could compare each file on the left with all files of equal size on the right (and vice versa) and then indicate any matches or mismatches.
There could be multiple files of the same size even if none have been renamed, so any file on either side could exactly match the content of zero or one or more files on the other side. It would be very unlikely for many files on each side to match many on the other side. The results list could show which files match, which could require multiple lines for some files. Or, it could be less specific and show a substitute "fake" entry on the other side such as "<<<multiple files>>>". I prefer a mutiple-line approach. However, either way will be better than having to using File Compare many times per folder, especially when a camera card can hold hundreds of image files.
A possible alternative would be for TC to calculate md5 checksums for the files and display an additional column for the md5 so that we can sort and compare by the checksums without losing sight of the filenames. If it were to work this way then it would need a [by md5] option that overrides the other selection options. A downside of this alternative is the time taken to calculate md5s for files that cannot possibly match because they have different sizes.
Whatever the cause for name differences, files can only ever be an exact match if they have exactly the same size. So if we had an option to [ignore name] as well as the existing [ignore date] option in Synchronize Folders, then TC could compare each file on the left with all files of equal size on the right (and vice versa) and then indicate any matches or mismatches.
There could be multiple files of the same size even if none have been renamed, so any file on either side could exactly match the content of zero or one or more files on the other side. It would be very unlikely for many files on each side to match many on the other side. The results list could show which files match, which could require multiple lines for some files. Or, it could be less specific and show a substitute "fake" entry on the other side such as "<<<multiple files>>>". I prefer a mutiple-line approach. However, either way will be better than having to using File Compare many times per folder, especially when a camera card can hold hundreds of image files.
A possible alternative would be for TC to calculate md5 checksums for the files and display an additional column for the md5 so that we can sort and compare by the checksums without losing sight of the filenames. If it were to work this way then it would need a [by md5] option that overrides the other selection options. A downside of this alternative is the time taken to calculate md5s for files that cannot possibly match because they have different sizes.