Compare directories (on diff. drives) by content

English support forum

Moderators: white, Hacker, petermad, Stefan2

georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-28, 19:39 UTCOK, so, basically the same, but the primary sort would be by duplicate group and secondary by folder. Now only to decide how to separate each duplicate group.
So perhaps by that secondary pop-up-window (with identical appearance and sorting-structure; "Show Only Identical-Dupes") on right mouse-click on any selected file in the primary results-window (file under cursor) which would then only show all identical "twin"-duplicates - as I've already suggested.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
Well, that's not really a solution of how to adapt the current window without involving any other windows.
Also, the other window, as you say, would be identical, only hiding other duplicate groups. It seems unnecessary to open another window just to hide other duplicate groups. You could easily hide them in the original window, however, I don't see a reason why one would want to do that, assuming each duplicate group is clearly separated from the next.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-28, 22:37 UTC You could easily hide them in the original window, however, I don't see a reason why one would want to do that, assuming each duplicate group is clearly separated from the next.
A more than valid reason would be providing the administrator with an instant overview over each group of identical "twin"-duplicates should they be found too far scattered throughout multiple pages of the superordinate group of all "only Locally Unique"-files Left/Right with duplicates somewhere else as you've been rightfully worried about in your 9k-example.

The other approach - displaying only single groups of identical "twin"-duplicates from the start and then circling through them one by one - has its clear shortcomings, too. If there really were 1k different groups of mutually identical dupes (as in your 9k example) with such an approach the user would immediately get lost in the weeds of dozens if not hundreds of groups-of-identical-dupes which again would completely contradict our primary goal of providing a succinct and constraint-exposing, concise overview.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
valid reason would be providing the administrator with an instant overview over each group of identical "twin"-duplicates should they be found too far scattered throughout multiple pages
But we have already solved this?
Hacker wrote:the primary sort would be by duplicate group and secondary by folder
Is there something which does not work for you with this solution? I even illustrated it above in my previous text mockup.
Do you perhaps expect there to be duplicates from several duplicate groups in one folder?

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-29, 12:32 UTC But we have already solved this?
Not sure about that or if we have already found the optimal solution so far. Only displaying one group of ident-dupes after the other seems "too deep in the weeds" for the general overview - although necessary from time to time to identify those ident-dupes in a very large sample. For me the primary view for dupes should still be the collective display of all files with dupes found as represented by the (new) categories "Locally Unique Left/Right"-with-dupes-somewhere-else. If then a need arises to isolate singular groups of ident-dupes the best solution for me would still be by right-clicking on any file of interest and selecting an option "show only ident-dupes". Whether this means popping up a new window or just temporarily hiding all the non-ident-dupes within the same window (as you seemingly prefer) is just a secondary technicality to me.
Hacker wrote:the primary sort would be by duplicate group and secondary by folder

Is there something which does not work for you with this solution? I even illustrated it above in my previous text mockup.
Do you perhaps expect there to be duplicates from several duplicate groups in one folder?
I still don't think that in the present "Synchronize Dirs-implementation" the primary sort is by duplicates. The primary sort rather seems to be by folders which make up the header-entries above all single-file-entries. And as depicted in my sort-of-mockup-illustration that should exactly stay as it is. And as my mockup-illustration also clearly shows I most certainly do expect several (if not multiple) different dupes to show up in one folder. This is in fact a quite common case on my system. And yes, since those files in one (moved?) folder would be duplicates from somewhere else and not be mutually identical each of them (duplicate1 to duplicate4 in my mockup-example) would necessarily belong to a different group of identical duplicates.

Perhaps I haven't emphasized this strongly enough so far - but one of the main goals of my effort would be to help identify moved folders, even with slight modifications in them which currently remain undetected as in present "Synchronize Dirs" they are only misleadingly characterized as "Unique Left" and "Unique Right" while in reality they may only have been moved around and both versions then having been altered a little bit thereby becoming "asynchronous". And the true challenge then becomes to reconcile and not only formally but logically re-synchronize especially such rogue-(data?)-folders.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
temporarily hiding all the non-ident-dupes within the same window
There must be some misunderstanding, I never expressed any positive preference for this solution. I don't see any benefits to it.
I still don't think that in the present "Synchronize Dirs-implementation" the primary sort is by duplicates.
No, of course it is not, that's why I created a mockup of how it could be done.
And as my mockup-illustration also clearly shows I most certainly do expect several (if not multiple) different dupes to show up in one folder. This is in fact a quite common case on my system. And yes, since those files in one (moved?) folder would be duplicates from somewhere else and not be mutually identical each of them (duplicate1 to duplicate4 in my mockup-example) would necessarily belong to a different group of identical duplicates.
Hmm, OK, in this case we have two opposing needs - sort by duplicate groups and then by folders, or the other way round - sort by folders and then by duplicate groups. I assumed the first one would be more useful.
one of the main goals of my effort would be to help identify moved folders, even with slight modifications in them which currently remain undetected as in present "Synchronize Dirs" they are only misleadingly characterized as "Unique Left" and "Unique Right" while in reality they may only have been moved around and both versions then having been altered a little bit thereby becoming "asynchronous". And the true challenge then becomes to reconcile and not only formally but logically re-synchronize especially such rogue-(data?)-folders.
I see. Wouldn't this instead of a Sync dirs functionality then better be a job for some kind of "Find duplicate folders" function, which would show folder groups which contain the most duplicate files? Similar to the current implementation of finding duplicate files within "Find Files" but instead focusing on folders and grouping by folders? So, basically, find duplicate files, and show the groups of duplicate files sorted by "most duplicates within the fewest folders", so, 20 duplicates within one folder would be at the top of the list, 11 duplicates in one folder and 5 duplicates in other folder would be shown second, then 6 in folder A, 4 in folder B and 2 in folder C would be shown third, and so on, until we'd have a group of one duplicate file per folder, which would be sorted last.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-29, 14:36 UTC No, of course it is not, that's why I created a mockup of how it could be done.
Sad to say that concerning your mockup I never agreed that this would be a good idea.
Hacker wrote: 2022-12-29, 14:36 UTCHmm, OK, in this case we have two opposing needs - sort by duplicate groups and then by folders, or the other way round - sort by folders and then by duplicate groups. I assumed the first one would be more useful.
Objection, your honor! I see clear advantages for the "other way round"-approach.

Hacker wrote: 2022-12-29, 14:36 UTCI see. Wouldn't this instead of a Sync dirs functionality then better be a job for some kind of "Find duplicate folders" function, which would show folder groups which contain the most duplicate files? Similar to the current implementation of finding duplicate files within "Find Files" but instead focusing on folders and grouping by folders? So, basically, find duplicate files, and show the groups of duplicate files sorted by "most duplicates within the fewest folders", so, 20 duplicates within one folder would be at the top of the list, 11 duplicates in one folder and 5 duplicates in other folder would be shown second, then 6 in folder A, 4 in folder B and 2 in folder C would be shown third, and so on, until we'd have a group of one duplicate file per folder, which would be sorted last.
I really don't think so. The basic concept of the "Synchronize Dirs"-process still seems unsurpassed to me - other than its present inability to identify moved folders/files within two comparable data-structures (resulting exactly in duplicates found at different locations). Because identifying such folders is one thing - and by the way: ranking such folders by the number of duplicates found would seem pretty much irrelevant and wouldn't help to identify them as moved in any particular way (as for instance the folder with the highest number of duplicates may very well turn out to be an intentional backup-copy that wouldn't require any reconciliation at all). They lastly can only be identified as moved by visual inspection by a trained operator and not by a piece of software or any automated process.

But ONCE THEY ARE IDENTIFIED as moved/rogue folders in the other (comparable) data-structure under investigation the process of reconciliation needs to set in immediately. And it will sure need the full versatility and flexibility that the "Synchronize Dirs"-process has to offer for such cases. And that is exactly why a solution for this whole problem would best be located within an enhanced "Synchronize Dirs"-process - as all the options needed for subsequent data-reconciliation are already there - if, yes, if only the "Synchronize Dirs"-process had an intrinsic capability to detect and identify moved files within the other/parallel data-structure as duplicate-files now located in different places.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
The basic concept of the "Synchronize Dirs"-process still seems unsurpassed to me
Let me give you another example:

Code: Select all

 DirA\Dup1.txt    |   DirB\Dup1.txt
                  |   DirB\Dup2.txt
                  |   DirB\Dup3.txt
                  |   DirB\Dup4.txt
                  |
                  |   DirC\Dup5.txt
Seeing this, I'd like to delete Dup1 from DirA (easy), and move Dup5 from DirC to DirB (impossible). How would this be done within the functionality offered by Sync dirs? There are only ways to copy between Left and Right, not between dirs on the same side. How could this be solved?

To me it seems the implementation you are suggesting would only be beneficial in a very specific use case - the one you are describing.
A big change, such as the one you are requesting, is usually implemented when there is an obvious benefit to a majority of users.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-30, 13:15 UTCSeeing this, I'd like to delete Dup1 from DirA (easy), and move Dup5 from DirC to DirB (impossible). How would this be done within the functionality offered by Sync dirs? There are only ways to copy between Left and Right, not between dirs on the same side. How could this be solved?
Oh, wow, did you perhaps miss something here? Who says this couldn't be done? As a precaution we'd have to right-click the "DirB"-header and remove all selections (if any), then we'd select "Dup5.txt" found in DirC\ for copying right-to-left and click "Synchronize". In the "Synchronize"-window now popping up there will be a check-mark already set for the category "Right to Left" and the mask below will show the corresponding path to the left-side-data-structure. But who says we have to accept that? We can paste in there any arbitrary path accessible, there's even a button to browse for any desired directory (in our case the path to right-side-DirB of course) and - voila - we Press OK. I'm using that feature all the time and it is one of the main components that makes "Sync Dirs" so flexible.

One contemporary shortcoming although is that this copy-dialogue lacks a checkbox as often used in similar dialogues that would offer to move=delete-the-original-file(s)-after copying. So we now would have to select "DirC\Dup5.txt" again and choose "Delete right" from the menu.
Hacker wrote: 2022-12-30, 13:15 UTCTo me it seems the implementation you are suggesting would only be beneficial in a very specific use case - the one you are describing.
A big change, such as the one you are requesting, is usually implemented when there is an obvious benefit to a majority of users.
In particular the option to retrospectively divert the "Synchronization-Path" to any arbitrary path of our liking makes this tool already beneficial for a very broad scope of applications with quite different specificity.

But I would agree 100% that any big change for the "Sync Dirs"-subsystem should include a move-option for the "Synchronize"-window to make that process even more versatile and considerably increase its flexibility in order to benefit an even broader community of (power-) users.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
paste in there any arbitrary path accessible
You are right, it's not impossible. However, is this really a way you would call convenient? Sync dirs lives off of the left - right paradigm. If we want to select the path to target folders manually, we don't really need any left - right convention, just a simple duplicate search as already exists within Find files is enough.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-30, 15:27 UTC However, is this really a way you would call convenient? Sync dirs lives off of the left - right paradigm. If we want to select the path to target folders manually, we don't really need any left - right convention, just a simple duplicate search as already exists within Find files is enough.
No, as it stands now "convenient" wouldn't be the primary association to cross my mind here. And of course I'm open to any further improvements, not limited to the ones I've suggested here. In particular the moving-option needed in your example would be one first step in that direction that probably could most easily be implemented.

As for the principal necessity of the Sync-Dirs-tool although I beg to differ. It has seemingly been implemented to simply merge folders left and right so they would become equal on both sides. Now you can say - as you seemingly do - that's not a big deal and the whole tool would be expendable in toto.

Or you can see "synchronization" - as I prefer to do - as a much more complex process of data-reconciliation that does NOT end up in two equal folders necessarily but would rather try to extract the "best of both worlds" resulting in a (perhaps 3rd) most current and up-to-date, constructively merged version thereof.

And, yes, I agree that this could be done in a more convenient way with some minor (and perhaps my proposed major) improvement(s). But to start with - the left/right-structure totally makes sense to start from as any advanced form of data-reconciliation would need at least two (possibly even more) sources or strands of data that would have to be compared initially.

In the end it all depends on the concept of "synchronization" that you want to apply. If you see it as a more primitive way of formal assimilation then the current Sync-Tool will do - but if you have advanced logical reconciliation in mind it would need to be expanded and made even more versatile considerably.
User avatar
Hacker
Moderator
Moderator
Posts: 13052
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
Well, I agree a general "consolidate files from many folders into one" feature would definitely be interesting. I don't think Sync dirs, being limited to a Left directory structure and a Right directory structure, is the ideal starting point, though. I do not have any better ideas than already expressed, so while I support the idea in general, I don't have any usable concept in my mind. If anyone finds this idea interesting as well, feel free to jump in and perhaps share some input and ideas towards a workable general solution, if so inclined.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 250
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-30, 20:01 UTCWell, I agree a general "consolidate files from many folders into one" feature would definitely be interesting. I don't think Sync dirs, being limited to a Left directory structure and a Right directory structure, is the ideal starting point, though.
So thanks a lot for your commitment in this elaborate debate so far! And I think - since I still see "SyncDirs" (in spite of all its present shortcomings) as a most valuable tool in achieving such extended goals in the near future with a very high potential for dramatically increased versatility - it would now be most welcome for our "grandmaster", @ghisler(Author), to chime in with his view of the situation and maybe even superior concepts for the implementation of such broadened capabilities.
algol
Senior Member
Senior Member
Posts: 448
Joined: 2007-07-31, 14:45 UTC

Re: Compare directories (on diff. drives) by content

Post by *algol »

Well, I'm not "@ghisler(Author)" and have no final say here. And although I'm late to the party in this discussion I have to say that I find the proposed concept of a more generalized and enhanced version of SyncDirs a most intriguing proposal.

I could use such a versatile feature with the ability to consolidate data from different folders twice a week or even more often. In fact I've found it somewhat disturbing for years now that "Unique left" and "Unique right" doesn't really mean unique at all. Both of those categories may well contain copied or moved duplicates by the hundreds if the compared data-structures happen to be large enough and SyncDirs unfortunately won't recognize them as such. On the other hand if FindFile is used for that purpose I'll find the duplicates listed one by one and do not have the flexibility for further treatment that SyncDirs has to offer.

In particular the resulting sub-sets of (not really) "Unique left" and (not really) "Unique right" cannot be searched for duplicates at all, only complete parent-dirs can be. And there is no possibility to re-introduce findings from a separate duplicates-search into the SyncDirs-process again that would allow to treat the files-with-duplicates-subet in a different manner than the truly unique files (left or right).

Btw. a Happy New Year to you all!
algol
Senior Member
Senior Member
Posts: 448
Joined: 2007-07-31, 14:45 UTC

Re: Compare directories (on diff. drives) by content

Post by *algol »

georgeb wrote: 2023-01-09, 11:25 UTC
mmm wrote: 2023-01-09, 08:24 UTC I requested the same Sync Dirs enhancement here:
viewtopic.php?p=417266&hilit=mmm+ghisler#p417266
Excellent! I wasn't aware of your request so far. So I would also be interested in your opinion on my recent proposal of how to remedy this problem. As you can see following the link above cited by @petermad the core of my proposal would be to perform a duplicate-by-content-search (ignoring filenames) between the current groups "Unique Left" and "Unique Right" - which currently (and wrongfully) do contain moved/renamed duplicates in (perhaps multiple) different locations - all that from within "Sync Dirs" in a second optional run/pass.

After that - and depending on the duplicates found - those (so far pseudo-)unique groups would be further split into "Truly Unique Left/Right" and only "Locally Unique Left/Right" (with duplicates somewhere else). Those newly introduced groups of "Locally Unique Left/Right" would then be represented by a different color and also made separately selectable by split Left/Right-buttons.

Finally if any file in those duplicate-groups would be selected by cursor a new right-mouse-click-option would need to be implemented offering to only show all identical binary duplicates of that file currently selected.

With that concept any perceived necessity of a 1:1 relationship between the duplicates displayed would be rendered obsolete.

So what do you all think of my solution? Feel free to comment on my linked proposal in the TC-English section.
As summarized in the abstract above that would appear to me as the most mature, elaborate and technically feasible/realizable concept that I've seen so far - completely capable of remedying a long-standing issue in TC as discussed in the thread above.

So, please, perhaps "Mr.ghisler(Author)" might want to chime in here and tell us about his opinion/thoughts/ideas on that matter.
Post Reply