Compare directories (on diff. drives) by content

English support forum

Moderators: white, Hacker, petermad, Stefan2

georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Compare directories (on diff. drives) by content

Post by *georgeb »

In spite of having found a somewhat similar problem discussed in here recently mine is still a bit different. To illustrate the details and for a broader audience to understand I would like to discuss it by means of a music-file example although the actual problem is not music-related at all.

So let's assume I have downloaded a CD from an online-store in flac-format. Those titles usually begin with numbers, then comes the artist's name and finally the song-title itself.

To fit my naming convention I have removed those leading numbers and copied those files (with the name usually starting at position [N7-] in the MRT) to my music-archive on another drive.

After some time I have now re-downloaded the CD from the store because the initial download did contain some errors. So I would now like to compare the new download by content with the original files. And let's further assume (for reasons rooted in the actual problem) that I cannot rename the newly downloaded files in the same way as the original ones before the comparison resulting in different names for those possibly identical files in the two directories now under consideration.

The question now is: can I - and if so, how can I - compare those two directories (the new download on drive D: and the original music archive on drive E: ) in TC by content?

My approaches so far have been unsuccessful. First I've looked at "Files-Compare by Content". But this menu-option only seems to work for single files on each side, one by one.

So the more elegant approach would be to apply "Commands-Synchronize Dirs". Yet unfortunately this mechanism seems to mandatorily rely on the filename first and cannot be tuned to disregard the filename and compare by binary content only.

As a result "Commands-Synchronize Dirs" will find the whole number of songs as "Unique Left" and "Unique Right" respectively - even if all the song-files are identical.

Lastly there would be the option of "Find Files-Find Duplicates". This option would seemingly do the trick - except it doesn't appear to support searching and comparing by content only (disregarding the filename) 2 directories on 2 different drives.

Until now I have only found some pretty clumsy workaround - namely to temporarily copy one directory as a renamed sub-directory into the other and then run a "Find Files-Find Duplicates" from the parent. So if both Dirs are truly identical I would find as many duplicates as the number of total files in each Dir.

But I guess there should be a better, less clumsy way to do this, preferably within the "Commands-Synchronize Dirs"-process. But any solutions to this problem would be most welcome.
User avatar
Dalai
Power Member
Power Member
Posts: 9383
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Compare directories (on diff. drives) by content

Post by *Dalai »

georgeb wrote: 2022-12-24, 14:06 UTCLastly there would be the option of "Find Files-Find Duplicates". This option would seemingly do the trick - except it doesn't appear to support searching and comparing by content only (disregarding the filename) 2 directories on 2 different drives.
Oh, but it can. Add the directories to search in the "Search in" field, separated by semicolon. This is also documented in the TC help (press F1 while the Find files dialog is open). Then switch to Advanced tab and select [X] Find duplicates and [X] same contents, and deselect [ ] same name.

After the search is finished, you can feed the results to the listbox. After doing so, TC allows selecting duplicates based on certain parameters by pressing the Num+ key (which is usually used to expand a selection).

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Thanks @Dalai for pointing this out. Actually I knew about that possibility and thought to have tried that out. But somehow at this trial-run "Find Duplicates" didn't find any duplicates then. Well, now it did! No clue, what I've done wrong during the first attempt. Is it perhaps important that the path on each side closes with a backslash after the Directories' name - or not?

Thanks again for pointing this out.
User avatar
Dalai
Power Member
Power Member
Posts: 9383
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Compare directories (on diff. drives) by content

Post by *Dalai »

georgeb wrote: 2022-12-24, 15:32 UTCIs it perhaps important that the path on each side closes with a backslash after the Directories' name - or not?
If you mean in the "Search in" field, no that's not necessary.
Thanks again for pointing this out.
You're welcome.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Dalai wrote: 2022-12-24, 16:11 UTC You're welcome.
So with that info it eases my problem somewhat. But from the resulting file-panel with the binary duplicates found it still can be quite arduous to finally revise the data-structure, in particular when it is quite large with - maybe - thousands of files.

So from the ease of subsequent data-administration the "Synchronize Dirs"-dialogue would IMHO be much more powerful and clearly laid out for that purpose.

Wouldn't it be a nice thing to have for future TC-improvement if that duplicate-files-search could be integrated into the more comprehensive "Synchronize Dirs" process so that the categories of "Unique Left" and "Unique Right" could be rendered more precisely. Because as it stands now "Unique Left" (for example) doesn't mean truly unique at all! It only means that the files listed in this category are locally unique in this place while exact copies of those very files could exist as pseudo-"Unique Right" in multiple locations of the file structure.

So only a "duplicates-by-content"-search between the "Left"- and "Right"-sections from within the "Synchronize Dirs"-process could (quite easily) determine if the files in these categories are truly unique or if they are only locally unique with duplicates/moved-Dirs somewhere else.

As the current characterization of those files' "unique-ness" can be quite misleading a split of those categories into "truly unique" and only "locally unique" (left or right) with the ability to select and process those new categories separately thereafter would IMHO greatly improve the depth & power and also versatility of the whole "Synchronize Dirs"-process in general.
User avatar
Dalai
Power Member
Power Member
Posts: 9383
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Compare directories (on diff. drives) by content

Post by *Dalai »

georgeb wrote: 2022-12-24, 19:05 UTCBut from the resulting file-panel with the binary duplicates found it still can be quite arduous to finally revise the data-structure, in particular when it is quite large with - maybe - thousands of files.
That's why I mentioned the Num+ key. Pressing this key opens a new dialog that allows the selection of duplicates based on various criteria making the whole thing of removing the duplicates much easier.
Because as it stands now "Unique Left" (for example) doesn't mean truly unique at all!
We're talking about synchronizing directories here. The purpose is to synchronize both directory trees, meaning that afterwards both trees contain the same files. "Unique" in this context means "exists (in this directory) only one one side but not the other side". If you know a better term then go for it. TC's "Synchronize directories" bases its comparison on file name first, and also file size and date if "by content" and "ignore date" are disabled. You can compare by contents here as well, but only if file names match on both sides.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Dalai wrote: 2022-12-24, 19:24 UTC You can compare by contents here as well, but only if file names match on both sides.
And exactly that is the problem. In the general find-duplicate-files-dialogue there is an option to ignore the filename and only compare (for possible duplicate) by size and content for a very good reason. Without that option (I use it practically all of the time when looking for duplicates) the whole mechanism wouldn't be half as powerful as the whole characterization of files (regardless of their names) as true binary duplicates would break down immediately and thereby render the whole "duplicate"-declaration pretty much worthless within seconds!

So if that capability could be integrated into the much more complex "Synchronize Dirs"-process with the (already implemented) option to separately select and subsequently process whole sub-trees and directories in a different fashion the "magnitude-of-power" of the whole "Synchronize Dirs"-process could vice-versa be enormously increased. And the good thing is - all the necessary pre-requistes to achieve that goal are already there in TC!
User avatar
petermad
Power Member
Power Member
Posts: 14787
Joined: 2003-02-05, 20:24 UTC
Location: Denmark
Contact:

Re: Compare directories (on diff. drives) by content

Post by *petermad »

2georgeb

Did you ever try what Dalai suggested: After searching for duplicates, click "Feed to listbox" and then pres Num+ key (or use the command cm_SpreadSelection in case you don't have a numeric keyboard). A new dialog with several options for refining the search pops up.
License #524 (1994)
Danish Total Commander Translator
TC 11.03 32+64bit on Win XP 32bit & Win 7, 8.1 & 10 (22H2) 64bit, 'Everything' 1.5.0.1371a
TC 3.50 on Android 6 & 13
Try: TC Extended Menus | TC Languagebar | TC Dark Help | PHSM-Calendar
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

petermad wrote: 2022-12-25, 00:13 UTC 2georgeb

Did you ever try what Dalai suggested: After searching for duplicates, click "Feed to listbox" and then pres Num+ key (or use the command cm_SpreadSelection in case you don't have a numeric keyboard). A new dialog with several options for refining the search pops up.
Of course I've tried that, feeding all search-results to a listbox is what I practically do after every search. Even adding more refined secondary searches to that results-panel after the search is complete. Btw, IMHO <Shift>"Feed to listbox" (to a new panel) should be the standard-behavior of that button without having to press <Shift> together with it. And don't worry, I always have a numeric keypad as I wouldn't buy computers without one.

But the fact remains that the "Synchronize Dirs"-process, in particular its result-panel, is so much more versatile in making interactive choices (selections) of groups of (intentional or redundant) duplicates or even whole sub-trees of possibly moved files and with all the options of what to do next with them - if only true binary duplicates could be detected and selected separately from within there.

Sure, via <Num+> one can select files by common criteria - if there are any - but the forced paired grouping of single (groups of multiple) true duplicates makes it so much more arduous to select and decide for them, one by one, on individual user-inspection as this type of paired grouping - other than the "Synchronize Dirs"-result-panel - does not and cannot allow to "recognize" and handle/select multiple files (perhaps hundreds of them) and even whole sub-trees as possibly moved duplicates in a single step.

Isn't it a pity that the "Synchronize Dirs"-result-panel offers a much more structured view with far more versatile options of what to do next (copying/moving to different locations or group-wise deletion) with individual (large) groups of files - and then all of a sudden falls completely short of exhausting its full potential by not being able to differentiate between "truly unique" and merely "locally unique" files and present them in different colors and making them separately selectable.

To avoid further complicating the "Synchronize Dirs"-process for everyone what I have in mind is an optional two-step-process. First step - all remains as it is. Only then in an optional second step the "Unique Left" and "Unique Right" categories would again be searched for true binary duplicates disregarding different file-names and then split those categories once more into "Truly Unique" and only "Locally Unique", possibly colored in light/dark green and light/dark blue and making them separately selectable (only one type or both together) by two more dedicated category-buttons (in total reaching 6 categories instead of only 4 as of now).
User avatar
Hacker
Moderator
Moderator
Posts: 13061
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
First step - all remains as it is. Only then in an optional second step the "Unique Left" and "Unique Right" categories would again be searched for true binary duplicates disregarding different file-names
Sounds intriguing. How would you visually represent such groups of files? Say, four files in four different folders on the left side, and five files in five different folders on the right side? Grouping by folders would not work then, I guess? The files would have to be grouped by duplicates, and that would require a whole different view? What is your idea?

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
User avatar
Dalai
Power Member
Power Member
Posts: 9383
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Compare directories (on diff. drives) by content

Post by *Dalai »

Well, the visual representation is one thing. The more general problem is that the files wouldn't have a 1:1 relation anymore (like it is now) but it could be anything from 1:1 to n:n in the worst case. That makes it more difficult to decide which file(s) to keep, which to copy to the opposite side and which to delete. This is especially true for asymmetric mode where TC automatically makes a decision (customizable by the user).

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Hacker wrote: 2022-12-25, 14:46 UTC Sounds intriguing. How would you visually represent such groups of files? Say, four files in four different folders on the left side, and five files in five different folders on the right side? Grouping by folders would not work then, I guess? The files would have to be grouped by duplicates, and that would require a whole different view? What is your idea?
To begin with you will have to start admitting that for (in your example) 9 binary duplicates in different folder locations, 4 on the left and 5 on the right, possibly some of them (partly) renamed, it is impossible for a "dumb" software-dialogue - like <Num+> (as @Dalai and @petermad have suggested) - to determine which ones are either the original, a desirable copy under a different naming-convention, a backup to keep or simply some erroneously generated garbage from the past like temporarily moved and then forgotten "auxiliary" folder-copies. Only the owner and/or an intelligent administrator can decide in the end which ones to keep and which ones to copy/move to the desired location or to finally get rid of by deletion.

With <Num+> I can select all (e.g.) .jpg-files or all files older than 3 years but you cannot instruct this dialogue to select all undesirably redundant duplicates for deletion. Only the user can decide that one by one in an interactive fashion.

And that is exactly where the "Synchronize Dirs"-result-window would come into play. After the optional second run all "truly unique"-files left/right would now be colored in - say - dark-green and dark-blue whereas all only "locally unique" files with binary duplicates somewhere else (like our exemplary group of nine) would now be colored in lighter green (4 on the left) and lighter blue (5 on the right) respectively. Now the newly split category-buttons come into play. Like de-selecting all identical and different files (the latter would have to be taken care of separately) I would now additionally de-select all the (dark-green/blue) "Truly Unique" files thereby reducing the cluttered-up display considerably and leaving only all of the binary duplicates behind to be displayed. If we're lucky and there aren't too many other duplicates found - all 9 of our exemplary duplicates should now be visible altogether in the same single screen (or 2 consecutive screens if we're not THAT lucky), each of them together with and under its parent-folder. With that grouped structure of display (other than the present 1:1 display after FindFile) it should now become a breeze to individually and interactively select/check-the-box-of which of the duplicates to keep, copy/move to the desired location or finally delete as redundant garbage.

But the real bargain of this display-structure comes into play when there aren't only 9 individual file-duplicates found but entire moved/renamed duplicate-folders as they could now be folder-wise de-/selected by a single check-box other than the present situation with 1:1-single-file-wise-display after FileFind.

@Dalai If you know of a proper method to select all the duplicates found within one particular folder by the present FileFind-process with <Num+> at once I would be eager to know.
User avatar
Dalai
Power Member
Power Member
Posts: 9383
Joined: 2005-01-28, 22:17 UTC
Location: Meiningen (Südthüringen)

Re: Compare directories (on diff. drives) by content

Post by *Dalai »

2georgeb
What you describe doesn't fit the Synchronize directories dialog (IMO). You want to identify and delete duplicates, not synchronize directories, i.e. copy files to the opposite side.
georgeb wrote: 2022-12-25, 20:05 UTC@Dalai If you know of a proper method to select all the duplicates found within one particular folder by the present FileFind-process with <Num+> at once I would be eager to know.
Looks like you missed the second tab "Select by folder" in the Num+ (Select duplicate files) dialog.

Regards
Dalai
#101164 Personal licence
Ryzen 5 2600, 16 GiB RAM, ASUS Prime X370-A, Win7 x64

Plugins: Services2, Startups, CertificateInfo, SignatureInfo, LineBreakInfo - Download-Mirror
User avatar
Hacker
Moderator
Moderator
Posts: 13061
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Compare directories (on diff. drives) by content

Post by *Hacker »

georgeb,
If we're lucky and there aren't too many other duplicates found - all 9 of our exemplary duplicates should now be visible altogether in the same single screen (or 2 consecutive screens if we're not THAT lucky)
But, how? I am unable to picture / imagine how those duplicates would be displayed. Say, there are 4000 files left and 5000 files right, where each 4 files left are duplicates of 5 files right, so we have 1000 groups of duplicates, every file in a different folder. How would this be presented to the user?

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
georgeb
Senior Member
Senior Member
Posts: 253
Joined: 2021-04-30, 13:25 UTC

Re: Compare directories (on diff. drives) by content

Post by *georgeb »

Dalai wrote: 2022-12-25, 20:24 UTC What you describe doesn't fit the Synchronize directories dialog (IMO). You want to identify and delete duplicates, not synchronize directories, i.e. copy files to the opposite side.
Not true. I MAY want to delete some but might want to copy/move others to another location as well. I simply want to be able to individually and interactively decide for and de-/select each and every duplicate pending arbitrary further operation. So at least partly - true synchronization is to follow next. And there the possibilities for further file-handling like copying/moving - and yes, deleting - in groups are much more versatile in the "Synchronize Dirs"-result-panel as compared to an endless list of 1:1-pairs of duplicates as is now the case with FileFind.
Dalai wrote: 2022-12-25, 20:24 UTCLooks like you missed the second tab "Select by folder" in the Num+ (Select duplicate files) dialog.
Well, not really but thanks again for pointing this feature out. I sure will have a deeper look into that once again and I think it might at least help to ease my current problem somewhat by the means already currently available.
Last edited by georgeb on 2022-12-26, 01:08 UTC, edited 1 time in total.
Post Reply