Synchronize Dirs: Option to ignore extension

Here you can propose new features, make suggestions etc.

Moderators: white, Hacker, petermad, Stefan2

HAF
Junior Member
Junior Member
Posts: 6
Joined: 2008-09-29, 13:42 UTC

Synchronize Dirs: Option to ignore extension

Post by *HAF »

I often have to compare different folders with the same filenames, but different extensions. I have to make sure, that every folder contains the same filenames.
Right now I have to manually change the extension (ext2 -> ext1), compare it and then change it back (ext1 -> ext2). It would be nice if there is a new option to ignore the file extension.
Since this option can be problematic with file operations (move, copy), the command Synchronize could be deactived, so it would be read-only.

If more than one file of the same name, but different extension should exist in one folder, there should be a new column count.


Example

Folder 1
ABC.ext1
ABC.extn
XYZ.ext1


Folder 2
ABC.ext2
XYZ.ext2

Filename (left) Count (left) Filename (right) Count (right)
ABC 2 ABC 1
XYZ 1 XYZ 1
Gruß

HAF
arko
Junior Member
Junior Member
Posts: 85
Joined: 2020-04-05, 06:41 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *arko »

bump, I have precisely the same issue to solve.
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *JardaSX »

I don't know if you really understand this, but the "extension" of a file is part of the name, it's a single field the whole thing, what happens is that Windows hides that for users so don't rename .exe to .jpg or something.
Implementing that feature would be the same as "ignore all characters after last certain another, in terms of comparison".
arko
Junior Member
Junior Member
Posts: 85
Joined: 2020-04-05, 06:41 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *arko »

JardaSX wrote: 2020-04-05, 16:15 UTC I don't know if you really understand this, but the "extension" of a file is part of the name, it's a single field the whole thing, what happens is that Windows hides that for users so don't rename .exe to .jpg or something.
Implementing that feature would be the same as "ignore all characters after last certain another, in terms of comparison".
Yeah, and filename is just a non-resident attribute in MFT... You are correct in the content of your message, however in essence regular user would still think "Is there a tickbox to exclude an extension from comparison?"
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *JardaSX »

arko wrote: 2020-04-06, 12:52 UTCin essence regular user would still think "Is there a tickbox to exclude an extension from comparison?"
Of course I understand that. But what happens if there are two files with same "name" but different extension? TC would have to add another routine to detect that and display it someway, even more if you are going to sync in write mode.
arko
Junior Member
Junior Member
Posts: 85
Joined: 2020-04-05, 06:41 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *arko »

JardaSX wrote: 2020-04-07, 14:53 UTC
arko wrote: 2020-04-06, 12:52 UTCin essence regular user would still think "Is there a tickbox to exclude an extension from comparison?"
Of course I understand that. But what happens if there are two files with same "name" but different extension? TC would have to add another routine to detect that and display it someway, even more if you are going to sync in write mode.
Hence the "ignore", TC would overwrite them. Maybe showing a warning pop-up before actioning it. For example:

1. User selects "Ignore extension" checkbox
2. TC shows a warning, - "Files with the same names will be overwritten. Do you want to proceed?"
3. If yes, proceed.
User avatar
Hacker
Moderator
Moderator
Posts: 13061
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Synchronize Dirs: Option to ignore extension

Post by *Hacker »

arko,
Hence the "ignore", TC would overwrite them.
We are still talking about comparison. Should TC compare left panel's ABC.ext1, ABC.ext2, ABC.ext3 or ABC.extn to right panel's ABC.ext2? If ABC.ext1 is newer but ABC.ext3 is older than ABC.ext2, which copy direction should be suggested as default? In case of an asymmetric synchronization, should all ABC.ext1 up to .extn be deleted, with the exception of .ext2? Let's start with a good solid concept that solves these basic questions and we can move forward from that.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *JardaSX »

What I suggest you is to use hardlinks, then use robocopy to mirror one folder to another where the hard links are. By default Robocopy follows symbolic links and hard links in destination and source unless specified, so will not overwrite the hard link itself. Symbolic links have extension .lnk so use hard links which can have any extension you like to fake the destination file:
  • You have your files which you modify regularly on S:\origin\
  • You copy them to S:\destination\ with extension changed (or you already have them)
  • You create folder S:\intermediate\
  • You make hardlinks pointing from "S:\intermediate\" to "S:\destination\" choosing extensions accordingly
  • Each time you use robocopy it will use the hardlink as a folder and write to destination with the file and extension specified by the hard link.
Scheme:

Code: Select all

[Original folder] FILE.TXT --------> [Intermediate folder] HARDLINK NAMED FILE.TXT BUT POINTS TO FILE.PDF ON ANOTHER FOLDER
                                               |------------> [Destination folder] FILE.PDF WITH SAME CONTENT AS FILE.TXT (=CHECKSUM)
Example with one file

Command Prompt (use it to generate all hardlinks one by one):

Code: Select all

md S:\intermediate\
C:\>mklink  /h  "S:\intermediate\mynote.txt"  "S:\destination\mynote.jpg"
Hardlink created for S:\intermediate\mynote.txt <<===>> S:\destination\mynote.jpg
You can use also Link Shell Extension or similar GUI to generate the hardlinks from Windows Explorer.

Copy example with Powershell (use *.* instead of mynote.txt when you have hardlinks for all files):
Spoiler

Code: Select all

PS S:\> Robocopy.exe  S:\origin\  S:\intermediate\  mynote.txt

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows
-------------------------------------------------------------------------------

  Started : 
   Source : S:\origin\
     Dest : S:\intermediate\

    Files : mynote.txt

  Options : /DCOPY:DA /COPY:DAT /R:1000000 /W:30

------------------------------------------------------------------------------

                           1    S:\origin\
100%        Newer                     40        mynote.txt

------------------------------------------------------------------------------

               Total    Copied   Skipped  Mismatch    FAILED    Extras
    Dirs :         1         0         1         0         0         0
   Files :         1         1         0         0         0         0
   Bytes :        40        40         0         0         0         0
   Times :   0:00:00   0:00:00                       0:00:00   0:00:00


   Speed :                 625 Bytes/sec.
   Speed :               0.035 MegaBytes/min.
   Ended : 
   
Then you just use Total Commander, "S:\origin\" on left panel and "S:\intermediate\" on right panel, and will work like Robocopy because by default follows hardlinks.
arko
Junior Member
Junior Member
Posts: 85
Joined: 2020-04-05, 06:41 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *arko »

Hacker wrote: 2020-04-07, 23:31 UTC arko,
Hence the "ignore", TC would overwrite them.
We are still talking about comparison. Should TC compare left panel's ABC.ext1, ABC.ext2, ABC.ext3 or ABC.extn to right panel's ABC.ext2? If ABC.ext1 is newer but ABC.ext3 is older than ABC.ext2, which copy direction should be suggested as default? In case of an asymmetric synchronization, should all ABC.ext1 up to .extn be deleted, with the exception of .ext2? Let's start with a good solid concept that solves these basic questions and we can move forward from that.

Roman
Good questions! I was referring to unidirectional copying only, let me provide a real case example:
  1. There is a complex folder structure with disparate image formats
  2. When processing the above, majority of available applications for image comparison has proven to either refuse to read or crash
  3. Therefore, we convert everything (using ImageMagick, ocioconvert, nconvert...) to sRGB JPEGs preserving folder structure
  4. After the comparison is done on JPEGs generated in step #3, remove duplicates
  5. As the last step, we compare "clean" folder tree to the "dirty" one while ignoring the file extension. Only files present in "clean" are preserved.
Makes sense? :wink:
User avatar
Hacker
Moderator
Moderator
Posts: 13061
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Synchronize Dirs: Option to ignore extension

Post by *Hacker »

arko,
Makes sense?
No, not really. This sounds more like a job for a simple command line utility / script than for TC's synchronize dirs.
1. Convert all images to an image data only format, if you want to compare the images only and ignore the metadata.
Alternatively, calculate an image hash using something like imagemagick's "magick.exe identify -verbose -moments image.jpg" (the "Channel perceptual hash" part).
2. Compare the images / image hashes.
3. Remove the duplicates.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
JardaSX
Junior Member
Junior Member
Posts: 27
Joined: 2020-04-04, 23:27 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *JardaSX »

If you have one picture in PNG and another in WEBP, and you convert both to JPG, what makes you think the results will have the same checksum?
arko
Junior Member
Junior Member
Posts: 85
Joined: 2020-04-05, 06:41 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *arko »

Hacker wrote: 2020-04-10, 12:40 UTC arko,
Makes sense?
No, not really. This sounds more like a job for a simple command line utility / script than for TC's synchronize dirs.
OK, got your point. Python to the rescue.
JardaSX wrote: 2020-04-10, 14:17 UTC If you have one picture in PNG and another in WEBP, and you convert both to JPG, what makes you think the results will have the same checksum?
That is not what I wrote. Please refer to https://www.ghisler.ch/board/viewtopic.php?p=383045#p383045



...Sounds like this topic is exhausted and shell be closed?
User avatar
Hacker
Moderator
Moderator
Posts: 13061
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Synchronize Dirs: Option to ignore extension

Post by *Hacker »

arko,
Python to the rescue.
Oh, if you know Python, there are some libraries to easily calculate an image hash - https://github.com/JohannesBuchner/imagehash .
Please refer to viewtopic.php?p=383045#p383045
Well, in the quoted post you mention to convert everything to JPEG and compare those. If you are looking for absolutely exact duplicates, that is probably fine, but for finding similar images a lossy format is perhaps not the ideal comparison source.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
arko
Junior Member
Junior Member
Posts: 85
Joined: 2020-04-05, 06:41 UTC

Re: Synchronize Dirs: Option to ignore extension

Post by *arko »

Hacker wrote: 2020-04-12, 13:21 UTC arko,
Python to the rescue.
Oh, if you know Python, there are some libraries to easily calculate an image hash - https://github.com/JohannesBuchner/imagehash .
Please refer to viewtopic.php?p=383045#p383045
Well, in the quoted post you mention to convert everything to JPEG and compare those. If you are looking for absolutely exact duplicates, that is probably fine, but for finding similar images a lossy format is perhaps not the ideal comparison source.

Roman
I have clearly failed to explain an idea, file hashes are irrelevant here. Shall go and work on my writing skills.
User avatar
Hacker
Moderator
Moderator
Posts: 13061
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Re: Synchronize Dirs: Option to ignore extension

Post by *Hacker »

arko,
Hmm, have I misunderstood? Are you not looking for a solution to remove duplicate images regardless of image type (gif / jpg / png / etc.)?

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
Post Reply