Synchronise directories - does not detect files with identical time stamp

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
elkeverb
Junior Member
Junior Member
Posts: 2
Joined: 2020-11-21, 01:18 UTC

Synchronise directories - does not detect files with identical time stamp

Post by *elkeverb »

Good day,

For years I have used "synchronise directories" with great pleasure. Lately, it is not able to do what it is meant to do (and what it is so powerful in) as TC can't detect files (MSOffice files) that have an identical name, an identical time stamp, but do have a different file size. I understand that these files are, from TC's perspective, different, but to the user they are exactly the same. Is there an option that I didn't find yet to make TC detect these files? Is it otherwise possible to build this in?

Some explanation: Our data have been migrated to SharePoint. My local hard drive data have been copied to SharePoint. From there, I've synced back again to my local drive. When I compare the original folder on my local hard drive and the synced folder on my local drive, it appears that MS Office files (so not my GIS files likes shapes, tifs, or PDFs, txt, csv etc) are detected as being different. Apparently some meta information or something is added to the file (they are structurally a bit larger in size than the original files), but content wise the files haven't changed. Even the time stamp remains the same. If I would be able to do a folder comparison and have the option to ignore identical timestamps, that would save a lot of time. The folders might contain files that actually did change, so just deselecting "by content" wouldn't solve my problem.

If this can't be built in or would take a long time to be incorporated, is there a simple workaround of detecting these files?

Thanks a lot!
Elke
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: Synchronise directories - does not detect files with identical time stamp

Post by *gdpr deleted 6 »

TC's synchronize feature is still 100% able to do what it is meant to do, and will correctly identify files being the same or different (with regard to TC's rules of "sameness"; any possible as of yet undiscovered bugs not withstanding). It's just not what you want/need right now. But you realized as much already. TC's synchronize dirs feature doesn't analyse file content to see whether the file content of two different files represent the same document (based on whatever indidvidual definition of "sameness"), [EDIT: The following is an example of stupid things people say on the internet; and with "people" i mean me. The comparison function of TC's directory synchronizan feature can of course be expanded/altered by plug-ins] a̶n̶d̶ ̶u̶n̶f̶o̶r̶t̶u̶n̶a̶t̶e̶l̶y̶ ̶t̶h̶i̶s̶ ̶f̶e̶a̶t̶u̶r̶e̶ ̶c̶a̶n̶n̶o̶t̶ ̶b̶e̶ ̶e̶x̶p̶a̶n̶d̶e̶d̶ ̶w̶i̶t̶h̶ ̶p̶l̶u̶g̶-̶i̶n̶s̶ ̶e̶i̶t̶h̶e̶r̶ :(

I am afraid there is no simple workaround other than to manually check the documents or use some other software tailor-made for this task. There should be commercial tools out there which should do what you need, at least to some meaningful degree, but i am unfortunately unable to give concrete pointers. I recall, many years ago in the office i was working we had a similar problem of having had to match different Word document files to find out whether they are the same document or different revisions of the same document. I don't know the name of the software being used, as i wasn't directly involved in that operation. But i had seen the audit log this software generated, which we used as input for a script that organized the document files.

I don't know either if there are free kinds of such software out there. A (rather) cursory Google search did not reavel anything of value. If you don't find such software either, or you are unwilling/unable to shell out for commercial software, the only other thing i can suggest is to write a comparison tool by yourself. This of course entirely depends on you being able to program in some programming/scripting language or have someone available that can do it for you. It's likely that this is not really a suitable suggestion, but if the stars align just right, it might just be... (And if you are a company, then writing, testing and possibly maintaining a program/script on company time might perhaps be more expensive than purchasing commercial software to begin with.)

if you are certain that only specific metadata items have been added/altered/removed to/from the document files and the documents are otherwise unaltered, this should be a relatively easy programming task in a programming/scripting environment that can decompress ZIP files and read XML files [***]. Contemporary MS Office documents files use the "Office Open XML" file format. This is nothing more than a ZIP archive with a bunch of XML files (and other files for any non-text stuff, for example for any images in a document). Knowing what meta-data should be looked for, it should be possible to figure out which of the XML files carries those meta-data information. And if you know this, comparing for "document sameness" would then boil down to simply comparing all the XML files contained within the Office document file except that/those with the metadata.



[***] My suggestion above applies only if it is indeed really, really, 100000% certain that nothing else in the internal document structure has been changed. If that assumption turns out to be wrong, the outline i gave of this programming task won't apply and it could potentially change from a relatively easy to a rather colosally complicated programming project (a-ka practically impossible without investing rather serious time, skills, and funds). Note that two document files appearing the same to the eye when opened in the respective Office application does not necessarily mean their internal file structure has to be identical, too...
Last edited by gdpr deleted 6 on 2020-11-21, 14:15 UTC, edited 1 time in total.
elkeverb
Junior Member
Junior Member
Posts: 2
Joined: 2020-11-21, 01:18 UTC

Re: Synchronise directories - does not detect files with identical time stamp

Post by *elkeverb »

Thanks a lot for your extensive answer. I completely get your answer from bits and bytes perspective. It just looks very frustrating having compared two directories, have over 500 files reported with differences and determine manually whether the time stamp is identical. It seemed so easy to have a button that allows me to "use only time stamps + file name (ignore file differences)," I wouldn't want to suggest it as an option in the way TC performs the comparison, but as a suggestion to change the functionality of the way I can manipulate the results that TC reports.

I didn't come across other tools that can do the job. The work-around to avoid manual verification (which I consider too error prone): compares dirs > show only unequal files > copy results to Excel > select files with identical time stamp via formula > save as txt/csv > open in Notepad++ or alike > search and replace characters to get a concatenate string of files that can be pasted into TC search dialogue > delete those files.

That are quite a few steps extra for a result that TC already shows me on screen (plus, I haven't tested the character limit in the search dialogue - it might not ft). Instead of the option to consider equal time stamps as equal files (in the results window, not as alternative compare functionality!), I would also be helped if the Synchronize directories window would allow sorting by combined date (instead of on one column only). So have an option that allows listing of reported files by combined column timestamps (left + right timestamps), first listing files with identical timestamps, then listing files with different timestamps.

As I said, I fully understand it goes beyond the comparison functionality of the Synchronise directory. But TC does already report the information I need to make the decision (meaning, the time stamp). It's reporting functionality just misses something that (to me, as a non-developer!) seems relatively easy to build in. I mentioned in my first post that it are only MS Office files which come from synced SharePoint folders that show this behaviour. Since we work more and more in the cloud, I suspect that more users must notice this in future as well.

Am I correct and did you understand my request in a more complicated way than necessary (as a different file comparison option, instead as a different filter/sort method of the results)?

Thanks,
Elke
gdpr deleted 6
Power Member
Power Member
Posts: 872
Joined: 2013-09-04, 14:07 UTC

Re: Synchronise directories - does not detect files with identical time stamp

Post by *gdpr deleted 6 »

By the way, before i address your further explanation, let me please correct myself: I misspoke about TC's dir synchronization feature not being expandable by plug-ins. Contrary to my claim, the file comparison part of this feature is actually expandable by plug-ins. Not that i know of a plug-in that would be helpful/applicable in your situation, though. :(


From your additional explanation, i figure the following:

- You have file (document) pairs of different file sizes that are truly different document versions, i suppose indicated by different modification timestamps
- And you have file (document) pairs of different file sizes that are actually the same document version, i suppose indicated by identical modification timestamps

It seems you don't need anything that has to determine whether the content of two files are the same document. Sameness of the document is established entirely and only by the file names and the file modification timestamps being equal (which would not require "looking into" the file/document). Is my understanding correct?
Post Reply