finding exact image duplicates
Moderators: white, Hacker, petermad, Stefan2
finding exact image duplicates
Hi folks.
Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
"exact image duplicates": means that each pixel in image A has the same colour as in image B
Note that images can be 100% identical but have different file size and hash value (e.g. think of .jpeg images with comments or added/deleted exif data).
I have tested several "image duplicate finder" programs, and while they work extremely well for finding similar images, none of them offers the possibility to check if images are exactly identical. Some of them even display a "similarity percentage value", but - alas - 100% does not really mean 100% identical, but rather "a very close match" (which sometimes turns out to be 100% identical, but sometimes not). Bad terminology.
I also have command line programs which can compare two images and tell me whether or not they are 100% identical. But they are not capable of comparing multiple images (one file against a file list, or all files within a file list, or one file list against another one).
So how can I find 100% exact image duplicates ?
Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
"exact image duplicates": means that each pixel in image A has the same colour as in image B
Note that images can be 100% identical but have different file size and hash value (e.g. think of .jpeg images with comments or added/deleted exif data).
I have tested several "image duplicate finder" programs, and while they work extremely well for finding similar images, none of them offers the possibility to check if images are exactly identical. Some of them even display a "similarity percentage value", but - alas - 100% does not really mean 100% identical, but rather "a very close match" (which sometimes turns out to be 100% identical, but sometimes not). Bad terminology.
I also have command line programs which can compare two images and tell me whether or not they are 100% identical. But they are not capable of comparing multiple images (one file against a file list, or all files within a file list, or one file list against another one).
So how can I find 100% exact image duplicates ?
I could find only a single related thread in the forums here, but none of the three programs mentioned there
* Image Dupeless
* Image Dupe
* Dupe Detector
is capable of correctly identifying 100% exact duplicates.
* Image Dupeless
* Image Dupe
* Dupe Detector
is capable of correctly identifying 100% exact duplicates.
- Balderstrom
- Power Member
- Posts: 2148
- Joined: 2005-10-11, 10:10 UTC
I use, prismatic's DupDetector and VisiPics. I find both of their interfaces to be somewhat lacking, I created an AHK script to help usage of DupDetector. VisiPics interface is really clunky/awkward to me, but sometimes it's useful.
Usage of the script below would require 2 of my default library files: LB.ahk and AHK_Extra.ahk.
It's also configured for a multi-button mouse that has shift on a Thumb button, and an XButton4 XButton5 (Logitech's MX518).
The script is a little bit RAW, but it takes two to three times as much time and work to make a script foolproof. I just deal with the quirks, as there's always an InputBox/Question before it will MOVE/COPY or Delete a file.
There are a few ideas I've had to improve it's usage, but it works well enough for me atm.
Usage of the script below would require 2 of my default library files: LB.ahk and AHK_Extra.ahk.
It's also configured for a multi-button mouse that has shift on a Thumb button, and an XButton4 XButton5 (Logitech's MX518).
I can put those required Library files up for download if anyone wants to play with the script below. It would likely require some tweaking as far as hotkeys go for one's own usage.Primary usage: wrote:1) When 2 matches are displayed press Shift+RButton, which launches 2 instances of IrfanView, when done comparing with IrfanView, press Shift+RButton again. They will close and DupDetector is reactivated.
2) Pressing MButton will ask if you want to delete File1.
--- If it's the second file you want to delete, Scroll Down once first.
3A) XButton2: Presents an option to COPY File1 to the other path.
3B) Shift+XButton2: Presents an option to MOVE File1 to the other.
---> ScrollUp/Down moves the cursor in DupDetector's fileList. <---
File1 is the first file (of the two), unless the cursor is moved down to the first file --- Then they are effectively swapped. File1 is the second file.
Note: If the Shift key is on the keyboard, the ~RButtons:: below would likely need to be "~*RButton::" and XButton2:: would need to be "*XButton2::".
The script is a little bit RAW, but it takes two to three times as much time and work to make a script foolproof. I just deal with the quirks, as there's always an InputBox/Question before it will MOVE/COPY or Delete a file.
Code: Select all
;;
;; Balderstrom, Feb.01 2011
;; -- DupDet.ahk
;;
;; Automation Script for Prismatic's DupeDetector.
;;
DD_GetText(byRef winID, byRef imgNam1, byRef imgNum1, byRef imgNam2="", byRef imgNum2="")
{
checkMatchNo:=winID
LB_ControlGetID( cID, "ListBox1", winID:=WinExist("Dup Detector ahk_class #32770"))
cTxt:=LB_QueryText( cID, LB_QueryCursor(cID))
if( SubStr(cTxt, 1, 12) == "; Match no: " )
{
if( checkMatchNo <> -1 )
{
MsgBox,,,% "Match#: " SubStr(cTxt, 13), 1
return
}
ControlSend, ListBox1, {Down}, ahk_id %winID%
cTxt:=LB_QueryText( cID, LB_QueryCursor(cID))
}
RegExMatch(cTxt, "^Image (\d) path: (.*)$", dTmp)
imgNum1:=dTmp1, imgNam1:=dTmp2
;MsgBox, imgNum1: %imgNum1%
ControlFocus, ListBox1, ahk_id %winID%
Send, % (imgNum1 == 2) ? "{Up}" : "{Down}"
cTxt:=LB_QueryText(cID, LB_QueryCursor(cID))
RegExMatch(cTxt, "^Image (\d) path: (.*)$", dTmp)
imgNum2:=dTmp1, imgNam2:=dTmp2
; MsgBox, % "(" imgNum1 ") :: " imgNam1 "`n(" imgNum2 ") :: " imgNam2
return (imgNum1)
}
#ifWinActive, Dup Detector ahk_class #32770
MButton::
{
if( imgNum:=DD_GetText(winID, imgNam1, imgNum1, imgNam2, imgNum2))
{
MsgBox, 0x4,, % "Delete (" imgNum ")`n " (imgNum==1 ? "Left" : "Right") " Image?`n" ,4
ifMsgBox, NO
return
ControlFocus, % "Button" (3 + imgNum), ahk_id %winID%
Send, {Enter}
}
return
}
#l::
{
if( imgNum:=DD_GetText(winID, imgNam1, imgNum1, imgNam2, imgNum2))
{
ControlFocus, % "Button" (3 + imgNum), ahk_id %winID%
;; Button4 == Delete Image 1
;; Button5 == Delete Image 2
}
return
}
$LButton::
{
if( MouseInWindow( mControl, aWin:=WinActive("Dup Detector ahk_class #32770")))
ControlFocus, Static12, ahk_id %aWin%
Send, % GetKeyState("LButton", "P") ? "{LButton Down}" : ""
}
$LButton UP::
{
if(GetKeystate("LButton", "P"))
return
if( MouseInWindow( mControl, aWin:=WinActive("Dup Detector ahk_class #32770")))
{
ControlGetFocus, aFocus, A
MouseGetPos,,,,mCtrl, 2
ControlGet( aCtrl:="HWND#", aFocus, "A" )
ControlGetText( cTxt, aCtrl )
Send, {LButton Up}
if( mCtrl <> aCtrl )
return
if( cTxt == "Back" || cTxt == "Next" )
ControlSend, ListBox1, {Down}, ahk_id %aWin%
return
}
Send, {LButton Up}
return
}
WheelDown::
WheelUp::
{
if(GetKeyState("Shift"))
ControlSend, ListBox1, % (A_ThisHotkey == "WheelUp" ? "{Up}" : "{Down}"), A
else
Send, {%A_ThisHotkey%}
return
}
XButton2::
{
if( imgNum:=DD_GetText(winID:=-1, imgNam1, imgNum1, imgNam2, imgNum2))
{
SplitPath(imgNam1, file1, path1)
SplitPath(imgNam2, file2, path2)
if( GetKeyState("Shift") && copyFile:=1 )
MsgBox, 0x4,, % "COPY IMG (" imgNum ") TO (" (mod(imgNum, 2) + 1) ") ??`n " (mod(imgNum,2) ? "--->>" : "<<---")
else
MsgBox, 0x4,, % "Move IMG (" imgNum ") TO (" (mod(imgNum, 2) + 1) ") ??`n " (mod(imgNum,2) ? "--->>" : "<<---")
ifMsgBox, NO
return
if( copyFile && !(copyFile:=0))
{
FileCopy, %imgNam1%, %path2%
ControlFocus, % "Button" (3 + imgNum), ahk_id %winID%
}
else
{
FileMove, %imgNam1%, %imgNam2%, 1
ControlFocus, Button7, ahk_id %winID%
}
Send, {Enter}
Sleep, 250
ControlSend, ListBox1, {Down}, ahk_id %aWin%
}
return
}
~RButton::
{
if(!GetKeyState("Shift"))
return
if( closeIV && !(closeIV:=0))
{
WinClose, ahk_pid %ivPID1%
WinClose, ahk_pid %ivPID2%
return
}
if(!imgNum:=DD_GetText(winID:=-1, imgNam1, imgNum1, imgNam2, imgNum2))
return
EnvGet, gImageView, gImageView
Run, %gImageView%\IrfanView\i_view32.exe %imgNam1%,,,ivPID1
Run, %gImageView%\IrfanView\i_view32.exe %imgNam2%,,,ivPID2
closeIV:=1
}
return
#ifWinActive ahk_class IrfanView
{
~RButton::
if(!GetKeyState("Shift"))
return
if( closeIV && !(closeIV:=0))
{
WinClose, ahk_pid %ivPID1%
WinClose, ahk_pid %ivPID2%
ifWinActive, IrfanView ahk_class #32770
Send, {Tab}{Tab}{Enter}
}
WinActivate, Dup Detector ahk_class #32770
MouseMove, 229, 465
KeyWait, RButton
ifWinExist, ahk_class #32768
WinClose, ahk_class #32768
return
return
}
Re: finding exact image duplicates
Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...chrizoo wrote: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
Hoecker sie sind raus!
-
@Balderstrom
wow, I fear this is beyond my intellectual horizon. Can you tell me - in a nutshell - what all of this is in fact doing?
Computers don't have ears either, so why is it that they can find similar and identical audio files? See.
As a matter of fact, I even explained that
@Balderstrom
wow, I fear this is beyond my intellectual horizon. Can you tell me - in a nutshell - what all of this is in fact doing?
Silva, if you are not familiar with a specific IT branch (which is no shame) you shouldn't post guesswork ("it is not possible") as though it were facts.Sir_SiLvA wrote:Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...chrizoo wrote: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
Computers don't have ears either, so why is it that they can find similar and identical audio files? See.
As a matter of fact, I even explained that
Plus: see Balderstrom's previous posting. So why are you telling users here that it is impossible?chrizoo wrote: I also have command line programs which can compare two images and tell me whether or not they are 100% identical.
- Balderstrom
- Power Member
- Posts: 2148
- Joined: 2005-10-11, 10:10 UTC
Pressing RButton (mouse) launches IrfanView so I can see the images at the same time. Once making a decision one way or the other. I press RButton again. Close both IrfanView windows and reactivate DupDetector.
If I wanted to Copy the first file to the second's path: I would press XButton2 (mouse).
If I wanted to MOVE the first file to the second (overwrite): I would press Shift+XButton2.
If I wanted to Delete the first file, I would press MButton.
In all cases if I wanted the Source file (file to keep) to be the second file, I would scroll down once first.
Like most of my longer scripts it uses Library files. As that enables me to re-use functions and not have multiple versions of those functions scattered across numerous scripts --- which would make maintaining any of these scripts a nightmare...
As well outside of all of this, my main AHK system script, has an XButton2 definition for IrfanView (and IV is set to display the full path in it's titleBar) --- which opens Total Commander's right panel with that file selected.
If I wanted to Copy the first file to the second's path: I would press XButton2 (mouse).
If I wanted to MOVE the first file to the second (overwrite): I would press Shift+XButton2.
If I wanted to Delete the first file, I would press MButton.
In all cases if I wanted the Source file (file to keep) to be the second file, I would scroll down once first.
Like most of my longer scripts it uses Library files. As that enables me to re-use functions and not have multiple versions of those functions scattered across numerous scripts --- which would make maintaining any of these scripts a nightmare...
As well outside of all of this, my main AHK system script, has an XButton2 definition for IrfanView (and IV is set to display the full path in it's titleBar) --- which opens Total Commander's right panel with that file selected.
Thanks.
OK, the way I understand it is that you have to check manually (i.e. visually, in IrfanView), image by image in your result list.
That's not what I seek to do.
I want a list of 100% identical images.
Not a list of possibly 100% identical images, which I have to verify one-by-one (not to mention that a visual check is quite impossible for high resolution images).
Is this what you suggested, or did I misread your explanation?
OK, the way I understand it is that you have to check manually (i.e. visually, in IrfanView), image by image in your result list.
That's not what I seek to do.
I want a list of 100% identical images.
Not a list of possibly 100% identical images, which I have to verify one-by-one (not to mention that a visual check is quite impossible for high resolution images).
Is this what you suggested, or did I misread your explanation?
- Balderstrom
- Power Member
- Posts: 2148
- Joined: 2005-10-11, 10:10 UTC
Correct, I do matches for ~95-98% similarity. As they may be resized, flipped, or color adjusted.
For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
Eg, SubDir branchview, select files, and use TC to create a single MD5 file.
Then run something like:
Usage: xmd5 <Input Md5 fileName>
Will rename files to their MD5 string. You should sort the files by date or whatever you prefer, possibly prior to creating the MD5 file.
It moves/renames a file to its MD5-string if such a string doesn't exist.
Then it moves all remaining files to folder: __DUPE__
Then renames all MD5 strings back to their original filename.
Not correct, Sir_SiLvA...Sir_SiLvA wrote:Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...chrizoo wrote: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
Eg, SubDir branchview, select files, and use TC to create a single MD5 file.
Then run something like:
Code: Select all
@ECHO OFF
CLS
SETLOCAL
::
:: xmd5 v1.32
::
::SET MOVE=ECHO MOVE::
SET MOVE=MOVE
SET MD5="%~1"
mkdir __DUPE__ 1>NUL 2>&1
FOR /F "usebackq tokens=1* delims=*" %%A IN (%MD5%) DO (
IF NOT EXIST %%A (
%MOVE% "%%B" %%A
)
)
FOR /F "usebackq tokens=1* delims=*" %%A IN (%MD5%) DO (
IF EXIST "%%B" (
%MOVE% "%%B" __DUPE__
) ELSE (
%MOVE% %%A "%%B"
)
)
DIR __DUPE__\*.jp* 1>NUL 2>&1
IF %ERRORLEVEL% == 1 ECHO RMDIR __DUPE__
Will rename files to their MD5 string. You should sort the files by date or whatever you prefer, possibly prior to creating the MD5 file.
It moves/renames a file to its MD5-string if such a string doesn't exist.
Then it moves all remaining files to folder: __DUPE__
Then renames all MD5 strings back to their original filename.
2chrizoo
Try this:
http://ghisler.ch/board/viewtopic.php?t=25768
It's not really advanced but it's well integrated into sync dirs.
Try this:
http://ghisler.ch/board/viewtopic.php?t=25768
It's not really advanced but it's well integrated into sync dirs.
- Balderstrom
- Power Member
- Posts: 2148
- Joined: 2005-10-11, 10:10 UTC
Cool stuff, though the problem with SyncDirs is it requires exact filenames.
So no way to match:
Although both of those issues are longstanding wishes of mine for further development of the SyncTool. Compare by MD5, and ability to ignore paths when matching filenames: e.g. The ability to flatfile two directories/panels (subDirBranch view) and compare the panels by Filename or MD5 only. Similiar to CompareDirs command, except that has no options for which set of files you want to select (left/right) and what criteria you want to select by.
So no way to match:
As well, sync dirs requires the same directory structure. No way to compare exactly named files that exist in a subfolderIMG001.JPG to:
Copy of IMG001.JPG, or IMG001(1).JPG, etc.
--- without manually changing paths for each panel for each possible comparison. Not really feasible for me.H:\FOO\IMG001.JPG vs H:\BAR\IMG001.JPG
Although both of those issues are longstanding wishes of mine for further development of the SyncTool. Compare by MD5, and ability to ignore paths when matching filenames: e.g. The ability to flatfile two directories/panels (subDirBranch view) and compare the panels by Filename or MD5 only. Similiar to CompareDirs command, except that has no options for which set of files you want to select (left/right) and what criteria you want to select by.
- Balderstrom
- Power Member
- Posts: 2148
- Joined: 2005-10-11, 10:10 UTC
If you set DupDetector to 100% matches, you will get a ListBox "List" of image-exact matches. And if you configure it's settings to automatic, it can auto-delete as well.chrizoo wrote:Thanks.
OK, the way I understand it is that you have to check manually (i.e. visually, in IrfanView), image by image in your result list.
That's not what I seek to do.
I want a list of 100% identical images.
Not a list of possibly 100% identical images, which I have to verify one-by-one (not to mention that a visual check is quite impossible for high resolution images).
Is this what you suggested, or did I misread your explanation?
- Balderstrom
- Power Member
- Posts: 2148
- Joined: 2005-10-11, 10:10 UTC
I can think of a few different ways, that don't use actual image heuristics. But they aren't necessarily date-preserving and possibly process intensive.Lefteous wrote:The problem for my plugin is how to create such a hash for an identified image? This is not so easy.
E.g. a MD5 on the data of an image only, skipping EXIF and/or IPTC. Not sure how one would go about that, without purging that info from the files (keeping a backup), and restoring it afterwards.
Correct Balderstrom, that's the way to go.Balderstrom wrote:Not correct, Sir_SiLvA...Sir_SiLvA wrote:Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...
For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
However, a more robust hash, say 256bit or more, would be useful to avoid collisions, though. For this reason CRC is definitely not an option here!
Awesome!! That's a big step forward! I will look into it ASAP.
Why is it not listed here? http://www.totalcmd.net/authors/1284676.html
Although extremely useful, I suffer from the same limitations Balderstrom mentioned:
But as I said, your plugin is still a big step forward, Lefteous!Balderstrom wrote:Cool stuff, though the problem with SyncDirs is it requires exact filenames.
So no way to match:As well, sync dirs requires the same directory structure. No way to compare exactly named files that exist in a subfolderIMG001.JPG to:
Copy of IMG001.JPG, or IMG001(1).JPG, etc.--- without manually changing paths for each panel for each possible comparison. Not really feasible for me.H:\FOO\IMG001.JPG vs H:\BAR\IMG001.JPG
It's easy to see that Balderstrom was talking about hashing the image data only, not the entire file. So the files don't need to be "100% identical" contrary to what you said.Sir_SiLvA wrote:That only works for 100% identical files - not for 3 files showing all the same picture not being recognized by any prg as showing the same thing...Balderstrom wrote:For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
@chrizoo: DITO.
-
Last edited by chrizoo on 2011-03-09, 17:45 UTC, edited 4 times in total.