Compare directories (on diff. drives) by content

English support forum

Moderators: white, Hacker, petermad, Stefan2

georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-25, 21:10 UTCBut, how? I am unable to picture / imagine how those duplicates would be displayed. Say, there are 4000 files left and 5000 files right, where each 4 files left are duplicates of 5 files right, so we have 1000 groups of duplicates, every file in a different folder. How would this be presented to the user?
Well, with 9000 duplicates in total it gets ultra-tedious to make individual decisions anyway. But I still would prefer the directory-oriented display of the "Synchronize Dirs"-result-panel for both sides "locally unique left" and "locally unique right" for handling that situation as compared to an even "much more endless" list of 1:1 pairs of duplicates as it stands now to remedy this rather extreme situation.

What I've suggested so far should be quite feasible for our "grandmaster" Christian to implement. In an ideal world of course Christian could refine that process even further by somehow indexing the found groups of duplicates so that in the end - when narrowing down the results-display to only leaving duplicate files visible wouldn't help any longer to produce a well-arranged display for making individual decisions at a sheer magnitude of 9000 duplicates - the user could then even jump from one particular/single group of 9 identical duplicates to the next one in a total sample of 1000 groups.

But if I may reduce the number of found identical groups of duplicates to 3 instead of 1000 in your example - resulting in a total of 27 matches - I think you can clearly see the point. The goal always is to having possibly all of them clearly arranged together with their parent-dirs (by directory left/right, as the "Synchronize Dirs"-result-panel currently does) within a single screen and at a glance to facilitate individual decisions for every one single file or arbitrary groups thereof by the user/administrator!
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
I still would prefer the directory-oriented display of the "Synchronize Dirs"-result-panel for both sides "locally unique left" and "locally unique right"
So then if you have one file, how do you find its 8 duplicates amongst the other 8999 files?
What I've suggested so far should be quite feasible for our "grandmaster" Christian to implement.
The best chance of implementing is when a clear general solution is provided which can also handle extreme cases.
if I may reduce the number of found identical groups of duplicates to 3 instead of 1000 in your example - resulting in a total of 27 matches - I think you can clearly see the point
You may, but if you only consider the easiest and best case, it will never get implemented. TC's functionality must work and make pracitcal sense for all cases.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-26, 13:08 UTCSo then if you have one file, how do you find its 8 duplicates amongst the other 8999 files?.......
The best chance of implementing is when a clear general solution is provided which can also handle extreme cases.
Well, for a case as extrem as this one it would probably only work - as I have already indicated before - with the additional implementation of an even more complex indexing-solution. With that it would allow to narrow down the displayed files even further. When masking all but the actual duplicate files still would leave 9000 actual duplicates behind then the only feasible remedy I can think of would be an additional mechanism even further drastically limiting the display to only one particular group (after the other) of identical binary duplicates at a time with the ability of "jumping" from one group of identical binary duplicates to the next - back and forth - so that at a given point in time only one single group of identical binary duplicates would remain on display on the screen - offering each of them at a glance for making individual decisions.

When the total number of duplicates wouldn't exceed - say - 30-50 on the other hand it would perhaps make sense to leave them all on display together and once a single file would be selected then all its actual identical twin-duplicates might be instantly switched to inverted-color-representation or set blinking thereby expressing their direct correlation.
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
at a given point in time only one single group of identical binary duplicates would remain on display on the screen
How would that work? Would all other groups be hidden? How do you manage the case when there are 100 pages of files (90 files on each page) and files from a one duplicate group are to be found on pages 5, 16, 20, 47, 54, 68, 71, 82 and 88? Jump between them like jumping between differences in Compare by Contents? Hide all other duplicate groups? What is your concept / idea?

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-26, 17:40 UTCHow would that work? Would all other groups be hidden? How do you manage the case when there are 100 pages of files (90 files on each page) and files from a one duplicate group are to be found on pages 5, 16, 20, 47, 54, 68, 71, 82 and 88? Jump between them like jumping between differences in Compare by Contents? Hide all other duplicate groups? What is your concept / idea?
Yes, all other groups would then be (temporarily) hidden leaving only all 9 identical "twin"-duplicates of group 1 visible in this first step. The whole purpose of my concept is to avoid exactly the situation described above at all cost where those 9 identical "twin"-duplicates of group 1 would appear scattered on pages "5, 16, 20, 47, 54, 68, 71, 82 and 88". For decision-making and further individual processing it is absolutely essential that they appear altogether with their folder-structure visible at a glance within a single screen (or on 2 or 3 consecutive screens if there were really that many).

And after having individually processed this 1st group we can now cheerfully jump to all the "twin"-duplicates in group 2, again displayed all at a glance in (ideally) one single screen and proceed accordingly. Two down, 998 groups to go. :mrgreen:
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
OK, so the idea would be to group the files by duplicates, not by folder structure, as I assumed in my first post here, OK. That would require a reworked Sync dirs window then, since it would not be sorted by paths but by duplicate groups.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-26, 22:04 UTC OK, so the idea would be to group the files by duplicates, not by folder structure, as I assumed in my first post here, OK. That would require a reworked Sync dirs window then, since it would not be sorted by paths but by duplicate groups.
Not quite! And I'm afraid I'm running out of options to explain my proposal any clearer. The Sync-dirs window would remain structured exactly as it is, each one of the identical "twin"-duplicates would appear below and together with its parent-folder.

All results would still be sorted by paths and only by restricting the displayed results (as a last resort if such an extreme number of duplicates is found) to one single group of identical "twin"-duplicates at a time this method would make sure that all of the identical "twin"-duplicates found would be presented together in (hopefully) one single screen at a glance (still sorted by paths), only by means of the drastically reduced number of items displayed at each point in time avoiding a situation where all detected "twin"-duplicates could appear scattered throughout multiple pages as you have worried about in your recent post.

So in the end the identical "twin"-duplicates would be presented BOTH structured by paths AND together in one single spot!
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
Yes, if I understand you correctly, that is what I meant.
Then there would need to be some system / way to switch between duplicate groups or maybe some way to separate each duplicate group.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-27, 11:27 UTC Yes, if I understand you correctly, that is what I meant.
Then there would need to be some system / way to switch between duplicate groups or maybe some way to separate each duplicate group
Hurrah, yes, I guess we're finally "on the same page" by now.

Another possible approach for handling less extreme everyday-cases as compared to your 9000-example that has just crossed my mind would be to leave all duplicates found visibly on display (not only the identical ones) and then handle them by a 2-step-process:

1. If any one of the (<<9000) duplicates (within both categories "Locally Unique Left/Right") would be selected by the mouse/cursor a small window would immediately show the total number of identical "twin"-duplicates and all those other "twin"-duplicates would instantly switch to inverted colors in the list.

2. If there were too many "twins" to practically handle them from there or if they would appear too far scattered throughout multiple pages then a right-mouse-click-option ("group-window") on that file currently selected under cursor would pop-up another identically structured "Synchronize Dirs"-result-window - but now only showing the narrow group of identical "twin"-duplicates, thereby re-uniting all the identical duplicates in a single spot for decision-making.
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
Wouldn't it be easier like this?

Code: Select all

-------------------------------------
 Dir05\Dup1.txt   |
 Dir16\Dup2.txt   |
 Dir20\Dup3.txt   |
 Dir47\Dup4.txt   |
                  |   Dir54\Dup5.txt
                  |   Dir68\Dup6.txt
                  |   Dir71\Dup7.txt
                  |   Dir82\Dup8.txt
                  |   Dir88\Dup9.txt
-------------------------------------
 [ next       ]   |
 [ group      ]   |
                  |   [ of         ]
                  |   [ duplicates ]
-------------------------------------
                  |   [ another    ]
                  |   [ group      ]
 [ of         ]   |
 [ duplicates ]   |
-------------------------------------
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-27, 16:40 UTC georgeb,
Wouldn't it be easier like this?
Not sure if I get this right. What does "Dir05\Dup1.txt" in your example actually mean? Putting all identical groups found into .txt-output files wouldn't help as no further file-handling would be possible from there.

Or do those entries symbolize one group of identical twin-duplicates which just happen to be all text-files? In this case I would have to ask what "Dir88\Dup9.txt" would mean?

It looks a lot as if those listed duplicates would no longer be primarily sorted by their parent-directory. If so that would contradict my intention. Because if a folder was initially created as an auxiliary-folder and some files have (temporarily) been copied or moved in there I would expect that folder to contain multiple different duplicates. With that clue - and maybe some recollection of past temporary operations - this would help me to identify that folder as garbage as opposed to some desired backup-copy.

The strength of the "Synchronize Dirs"-process as opposed to the single-duplicate-files-oriented list of "FileFind-Find-Duplicates" is exactly the grouping by folder structure. Only in the "Synchronize Dirs"-result-window am I able to not only select single duplicate files but rather (if a folder can be identified by containing multiple duplicates as garbage) can switch back to full-folder-display (Unique Left/Right+different+identical) and then by right-clicking on the folder-header select the entire folder, be it for deletion or for movement to somewhere else as a whole (that is not only the duplicates).

So while sorting by duplicates may have its merits the SyncDir-method of sorting-by-folders-first together with those flexible categories is what makes it a more complex and thereby more mighty and more versatile tool in the end.
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
What does "Dir05\Dup1.txt" in your example actually mean? Putting all identical groups found into .txt-output files wouldn't help as no further file-handling would be possible from there.
It's a mockup of the Synchronize dirs window showing groups of duplicate files. DupXX.txt are duplicate files.
do those entries symbolize one group of identical twin-duplicates which just happen to be all text-files?
Exactly.
what "Dir88\Dup9.txt" would mean?
It's the 9th duplicate file in a group of duplicates. It is located in the folder called "Dir88".
It looks a lot as if those listed duplicates would no longer be primarily sorted by their parent-directory.
How would that look like, compared to my mockup? Can you create a mockup of your idea?

It seems what you are talking about would only really be usable in a scenario where most duplicates are located inside one folder (not spread across many folders).

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-27, 18:53 UTC It seems what you are talking about would only really be usable in a scenario where most duplicates are located inside one folder (not spread across many folders).
Well, I guess it's hard to foretell how duplicates might be distributed - before you actually see it. But yes, I often stumble upon duplicates where many or at least quite a few will appear to be located in the same (garbage?) folder. And I definitely wouldn't want to miss the opportunity of selecting this whole folder at once and to further proceed with it as a unity instead of only being able to make decisions for single files, be it with duplicates or not.

A realistic mockup would be hard for me to produce as the desired features are not yet implemented to e.g. produce a screenshot of a real situation that would illustrate my concept. Having said that such a mockup would look exactly like a present "Synchronize Dirs"-result-window where one or some or many of those duplicates found would be depicted in a group below (and sorted by) their parent folder which then could be further treated, depending on the situation, either one by one BUT ALSO AS A CORRELATED GROUP by right-clicking on that superordinate parent-folder-header thereby selecting the entire folder, including other files in there that wouldn't necessarily have duplicates somewhere else, and then further treat this entire folder as a unity, perhaps deleting 20 different duplicates at once by deleting this entire folder. This way files CAN be processed individually, one by one, BUT COULD ALSO BE PROCESSED AS ONE COHERENT BLOCK, perhaps deleting/moving hundreds of them in a single step by processing their entire parent-folder at once.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-27, 18:53 UTCHow would that look like, compared to my mockup? Can you create a mockup of your idea?
Well, I've tried to illustrate my idea. So here comes your mockup, sort of.
Image: https://www.dropbox.com/s/bhm1e9echzwdxub/mockup-1.jpg?dl=0
Of course with the new enhancement those 4 duplicates would now be recognized as duplicates as opposed to the present situation where they are only detected as "Unique Left" and "Unique Right".

Now since the upper/right folder would be identified as containing 4 binary duplicates and - if they are there for no good reason - thereby also as a candidate for a garbage-folder. Now we can switch back to full folder display (including Unique Left/Right + Identical + Different) as shown in image mockup2 and by right-clicking on the parent-folder-header select the entire folder e.g. for possible deletion.
Image: https://www.dropbox.com/s/o82033mmcpdnws6/mockup-2.jpg?dl=0
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
OK, so, basically the same, but the primary sort would be by duplicate group and secondary by folder. Now only to decide how to separate each duplicate group.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
Post Reply