finding exact image duplicates

English support forum

Moderators: white, Hacker, petermad, Stefan2

User avatar
chrizoo
Senior Member
Senior Member
Posts: 349
Joined: 2008-03-12, 02:42 UTC

finding exact image duplicates

Post by *chrizoo »

Hi folks.

:?: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?

:arrow: "exact image duplicates": means that each pixel in image A has the same colour as in image B

:!: Note that images can be 100% identical but have different file size and hash value (e.g. think of .jpeg images with comments or added/deleted exif data).

:( I have tested several "image duplicate finder" programs, and while they work extremely well for finding similar images, none of them offers the possibility to check if images are exactly identical. Some of them even display a "similarity percentage value", but - alas - 100% does not really mean 100% identical, but rather "a very close match" (which sometimes turns out to be 100% identical, but sometimes not). Bad terminology.

:arrow: I also have command line programs which can compare two images and tell me whether or not they are 100% identical. But they are not capable of comparing multiple images (one file against a file list, or all files within a file list, or one file list against another one).


:idea: So how can I find 100% exact image duplicates ?
User avatar
chrizoo
Senior Member
Senior Member
Posts: 349
Joined: 2008-03-12, 02:42 UTC

Post by *chrizoo »

I could find only a single related thread in the forums here, but none of the three programs mentioned there

* Image Dupeless
* Image Dupe
* Dupe Detector

is capable of correctly identifying 100% exact duplicates.
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

I use, prismatic's DupDetector and VisiPics. I find both of their interfaces to be somewhat lacking, I created an AHK script to help usage of DupDetector. VisiPics interface is really clunky/awkward to me, but sometimes it's useful.

Usage of the script below would require 2 of my default library files: LB.ahk and AHK_Extra.ahk.

It's also configured for a multi-button mouse that has shift on a Thumb button, and an XButton4 XButton5 (Logitech's MX518).
Primary usage: wrote:1) When 2 matches are displayed press Shift+RButton, which launches 2 instances of IrfanView, when done comparing with IrfanView, press Shift+RButton again. They will close and DupDetector is reactivated.
2) Pressing MButton will ask if you want to delete File1.
--- If it's the second file you want to delete, Scroll Down once first.

3A) XButton2: Presents an option to COPY File1 to the other path.
3B) Shift+XButton2: Presents an option to MOVE File1 to the other.

---> ScrollUp/Down moves the cursor in DupDetector's fileList. <---

File1 is the first file (of the two), unless the cursor is moved down to the first file --- Then they are effectively swapped. File1 is the second file.

Note: If the Shift key is on the keyboard, the ~RButtons:: below would likely need to be "~*RButton::" and XButton2:: would need to be "*XButton2::".
I can put those required Library files up for download if anyone wants to play with the script below. It would likely require some tweaking as far as hotkeys go for one's own usage.


The script is a little bit RAW, but it takes two to three times as much time and work to make a script foolproof. I just deal with the quirks, as there's always an InputBox/Question before it will MOVE/COPY or Delete a file.

Code: Select all

;;
;;	Balderstrom, Feb.01 2011
;;	-- DupDet.ahk
;;	
;;	Automation Script for Prismatic's DupeDetector.
;;

DD_GetText(byRef winID, byRef imgNam1, byRef imgNum1, byRef imgNam2="", byRef imgNum2="")
{
	checkMatchNo:=winID

	LB_ControlGetID( cID, "ListBox1", winID:=WinExist("Dup Detector ahk_class #32770"))

	cTxt:=LB_QueryText( cID, LB_QueryCursor(cID))

	if( SubStr(cTxt, 1, 12) == "; Match no: " )
	{
		if( checkMatchNo <> -1 )
		{
			MsgBox,,,% "Match#: " SubStr(cTxt, 13), 1
		return
		}
		ControlSend, ListBox1, {Down}, ahk_id %winID%
		cTxt:=LB_QueryText( cID, LB_QueryCursor(cID))
	}

	RegExMatch(cTxt, "^Image (\d) path: (.*)$", dTmp)
	imgNum1:=dTmp1, imgNam1:=dTmp2

;MsgBox, imgNum1: %imgNum1%
	ControlFocus, ListBox1, ahk_id %winID%
	Send, % (imgNum1 == 2) ? "{Up}" : "{Down}"

	cTxt:=LB_QueryText(cID, LB_QueryCursor(cID))
	RegExMatch(cTxt, "^Image (\d) path: (.*)$", dTmp)
	imgNum2:=dTmp1, imgNam2:=dTmp2
;	MsgBox, % "(" imgNum1 ") :: " imgNam1 "`n(" imgNum2 ") :: " imgNam2
return (imgNum1)
}

#ifWinActive, Dup Detector ahk_class #32770
	MButton::
	{
		if( imgNum:=DD_GetText(winID, imgNam1, imgNum1, imgNam2, imgNum2))
		{
			MsgBox, 0x4,, % "Delete (" imgNum ")`n  " (imgNum==1 ? "Left" : "Right") " Image?`n" ,4
			ifMsgBox, NO
				return
			ControlFocus, % "Button" (3 + imgNum), ahk_id %winID%
			Send, {Enter}
		}
	return
	}
	#l::
	{
		if( imgNum:=DD_GetText(winID, imgNam1, imgNum1, imgNam2, imgNum2))
		{
			ControlFocus, % "Button" (3 + imgNum), ahk_id %winID%
			;; Button4 == Delete Image 1
			;; Button5 == Delete Image 2
		}
	return
	}

	$LButton::
	{
		if( MouseInWindow( mControl, aWin:=WinActive("Dup Detector ahk_class #32770")))
			ControlFocus, Static12, ahk_id %aWin%
		Send, % GetKeyState("LButton", "P") ? "{LButton Down}" : ""
	}

	$LButton UP::
	{
		if(GetKeystate("LButton", "P"))
			return
		if( MouseInWindow( mControl, aWin:=WinActive("Dup Detector ahk_class #32770")))
		{
			ControlGetFocus, aFocus, A
			MouseGetPos,,,,mCtrl, 2
			ControlGet( aCtrl:="HWND#", aFocus, "A" )
			ControlGetText( cTxt, aCtrl )
			Send, {LButton Up}
			if( mCtrl <> aCtrl )
				return
			if( cTxt == "Back" || cTxt == "Next" )
				ControlSend, ListBox1, {Down}, ahk_id %aWin%
		return
		}
		Send, {LButton Up}
	return
	}

	WheelDown::
	WheelUp::
	{
		if(GetKeyState("Shift"))
			ControlSend, ListBox1, % (A_ThisHotkey == "WheelUp" ? "{Up}" : "{Down}"), A
		else
			Send, {%A_ThisHotkey%}
	return
	}

	XButton2::
	{
		if( imgNum:=DD_GetText(winID:=-1, imgNam1, imgNum1, imgNam2, imgNum2))
		{
			SplitPath(imgNam1, file1, path1)
			SplitPath(imgNam2, file2, path2)
			if( GetKeyState("Shift") && copyFile:=1 )
				MsgBox, 0x4,, % "COPY IMG (" imgNum ") TO (" (mod(imgNum, 2) + 1) ") ??`n   " (mod(imgNum,2) ? "--->>" : "<<---")
			else
				MsgBox, 0x4,, % "Move IMG (" imgNum ") TO (" (mod(imgNum, 2) + 1) ") ??`n   " (mod(imgNum,2) ? "--->>" : "<<---")
			ifMsgBox, NO
				return
			if( copyFile && !(copyFile:=0))
			{
				FileCopy, %imgNam1%, %path2%
				ControlFocus, % "Button" (3 + imgNum), ahk_id %winID%
			}
			else
			{
				FileMove, %imgNam1%, %imgNam2%, 1
				ControlFocus, Button7, ahk_id %winID%
			}
			Send, {Enter}
			Sleep, 250
			ControlSend, ListBox1, {Down}, ahk_id %aWin%
		}
	return
	}

	~RButton::
	{
		if(!GetKeyState("Shift"))
			return
		if( closeIV && !(closeIV:=0))
		{
			WinClose, ahk_pid %ivPID1%
			WinClose, ahk_pid %ivPID2%
		return
		}
		if(!imgNum:=DD_GetText(winID:=-1, imgNam1, imgNum1, imgNam2, imgNum2))
			return
		EnvGet, gImageView, gImageView
		Run, %gImageView%\IrfanView\i_view32.exe %imgNam1%,,,ivPID1
		Run, %gImageView%\IrfanView\i_view32.exe %imgNam2%,,,ivPID2
		closeIV:=1
	}

return

#ifWinActive ahk_class IrfanView
{
	~RButton::
		if(!GetKeyState("Shift"))
			return
		if( closeIV && !(closeIV:=0))
		{
			WinClose, ahk_pid %ivPID1%
			WinClose, ahk_pid %ivPID2%
			ifWinActive, IrfanView ahk_class #32770
				Send, {Tab}{Tab}{Enter}
		}

		WinActivate, Dup Detector ahk_class #32770
		MouseMove, 229, 465
		KeyWait, RButton
		ifWinExist, ahk_class #32768
		WinClose, ahk_class #32768
	return
return
}
There are a few ideas I've had to improve it's usage, but it works well enough for me atm.
User avatar
Sir_SiLvA
Power Member
Power Member
Posts: 3300
Joined: 2003-05-06, 11:46 UTC

Re: finding exact image duplicates

Post by *Sir_SiLvA »

chrizoo wrote: :?: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...
Hoecker sie sind raus!
User avatar
chrizoo
Senior Member
Senior Member
Posts: 349
Joined: 2008-03-12, 02:42 UTC

Post by *chrizoo »

-
@Balderstrom
wow, I fear this is beyond my intellectual horizon. Can you tell me - in a nutshell - what all of this is in fact doing?

Sir_SiLvA wrote:
chrizoo wrote: :?: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...
Silva, if you are not familiar with a specific IT branch (which is no shame) you shouldn't post guesswork ("it is not possible") as though it were facts.

Computers don't have ears either, so why is it that they can find similar and identical audio files? See.

As a matter of fact, I even explained that
chrizoo wrote: :arrow: I also have command line programs which can compare two images and tell me whether or not they are 100% identical.
Plus: see Balderstrom's previous posting. So why are you telling users here that it is impossible?
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

Pressing RButton (mouse) launches IrfanView so I can see the images at the same time. Once making a decision one way or the other. I press RButton again. Close both IrfanView windows and reactivate DupDetector.
If I wanted to Copy the first file to the second's path: I would press XButton2 (mouse).
If I wanted to MOVE the first file to the second (overwrite): I would press Shift+XButton2.

If I wanted to Delete the first file, I would press MButton.

In all cases if I wanted the Source file (file to keep) to be the second file, I would scroll down once first.

Like most of my longer scripts it uses Library files. As that enables me to re-use functions and not have multiple versions of those functions scattered across numerous scripts --- which would make maintaining any of these scripts a nightmare...

As well outside of all of this, my main AHK system script, has an XButton2 definition for IrfanView (and IV is set to display the full path in it's titleBar) --- which opens Total Commander's right panel with that file selected.
User avatar
chrizoo
Senior Member
Senior Member
Posts: 349
Joined: 2008-03-12, 02:42 UTC

Post by *chrizoo »

Thanks.
OK, the way I understand it is that you have to check manually (i.e. visually, in IrfanView), image by image in your result list.

That's not what I seek to do.

I want a list of 100% identical images.
Not a list of possibly 100% identical images, which I have to verify one-by-one (not to mention that a visual check is quite impossible for high resolution images).

Is this what you suggested, or did I misread your explanation?
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

Correct, I do matches for ~95-98% similarity. As they may be resized, flipped, or color adjusted.

Sir_SiLvA wrote:
chrizoo wrote: :?: Does anyone know a way of finding "100% exact image duplicates" (preferably within TC, e.g. by using an addon; otherwise an external program) ?
Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...
Not correct, Sir_SiLvA...
For 100% matches you could just do a CRC or MD5 match without utilizing either of those.

Eg, SubDir branchview, select files, and use TC to create a single MD5 file.

Then run something like:

Code: Select all

@ECHO OFF
CLS
SETLOCAL 
::
::	xmd5 v1.32
::
::SET MOVE=ECHO MOVE::
SET MOVE=MOVE
SET MD5="%~1"

mkdir __DUPE__ 1>NUL 2>&1
FOR /F "usebackq tokens=1* delims=*" %%A IN (%MD5%) DO (
	IF NOT EXIST %%A (
		%MOVE% "%%B" %%A
	)
)

FOR /F "usebackq tokens=1* delims=*" %%A IN (%MD5%) DO (
	IF EXIST "%%B" (
		%MOVE% "%%B" __DUPE__
	) ELSE (
		%MOVE% %%A "%%B"
	)
)
DIR __DUPE__\*.jp* 1>NUL 2>&1
IF %ERRORLEVEL% == 1 ECHO RMDIR __DUPE__
Usage: xmd5 <Input Md5 fileName>

Will rename files to their MD5 string. You should sort the files by date or whatever you prefer, possibly prior to creating the MD5 file.
It moves/renames a file to its MD5-string if such a string doesn't exist.
Then it moves all remaining files to folder: __DUPE__
Then renames all MD5 strings back to their original filename.
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

2chrizoo
Try this:
http://ghisler.ch/board/viewtopic.php?t=25768

It's not really advanced but it's well integrated into sync dirs.
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

Cool stuff, though the problem with SyncDirs is it requires exact filenames.
So no way to match:
IMG001.JPG to:
Copy of IMG001.JPG, or IMG001(1).JPG, etc.
As well, sync dirs requires the same directory structure. No way to compare exactly named files that exist in a subfolder
H:\FOO\IMG001.JPG vs H:\BAR\IMG001.JPG
--- without manually changing paths for each panel for each possible comparison. Not really feasible for me.

Although both of those issues are longstanding wishes of mine for further development of the SyncTool. Compare by MD5, and ability to ignore paths when matching filenames: e.g. The ability to flatfile two directories/panels (subDirBranch view) and compare the panels by Filename or MD5 only. Similiar to CompareDirs command, except that has no options for which set of files you want to select (left/right) and what criteria you want to select by.
User avatar
Lefteous
Power Member
Power Member
Posts: 9535
Joined: 2003-02-09, 01:18 UTC
Location: Germany
Contact:

Post by *Lefteous »

The problem for my plugin is how to create such a hash for an identified image? This is not so easy.
User avatar
Sir_SiLvA
Power Member
Power Member
Posts: 3300
Joined: 2003-05-06, 11:46 UTC

Post by *Sir_SiLvA »

Balderstrom wrote:For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
That only works for 100% identical files - not for 3 files showing all the same picture not being recognized by any prg as showing the same thing...

@chrizoo: DITO.
Hoecker sie sind raus!
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

chrizoo wrote:Thanks.
OK, the way I understand it is that you have to check manually (i.e. visually, in IrfanView), image by image in your result list.

That's not what I seek to do.

I want a list of 100% identical images.
Not a list of possibly 100% identical images, which I have to verify one-by-one (not to mention that a visual check is quite impossible for high resolution images).

Is this what you suggested, or did I misread your explanation?
If you set DupDetector to 100% matches, you will get a ListBox "List" of image-exact matches. And if you configure it's settings to automatic, it can auto-delete as well.
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

Lefteous wrote:The problem for my plugin is how to create such a hash for an identified image? This is not so easy.
I can think of a few different ways, that don't use actual image heuristics. But they aren't necessarily date-preserving and possibly process intensive.

E.g. a MD5 on the data of an image only, skipping EXIF and/or IPTC. Not sure how one would go about that, without purging that info from the files (keeping a backup), and restoring it afterwards.
User avatar
chrizoo
Senior Member
Senior Member
Posts: 349
Joined: 2008-03-12, 02:42 UTC

Post by *chrizoo »

Balderstrom wrote:
Sir_SiLvA wrote:Not possible with computers - computers dont have eyes - if you dont get it done with the prgs you have listed you have to do it yourself...
Not correct, Sir_SiLvA...
For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
Correct Balderstrom, that's the way to go.
However, a more robust hash, say 256bit or more, would be useful to avoid collisions, though. For this reason CRC is definitely not an option here!

Lefteous wrote:2chrizoo
Try this:
http://ghisler.ch/board/viewtopic.php?t=25768
Awesome!! :D That's a big step forward! I will look into it ASAP.
Why is it not listed here? http://www.totalcmd.net/authors/1284676.html

Although extremely useful, I suffer from the same limitations Balderstrom mentioned:
Balderstrom wrote:Cool stuff, though the problem with SyncDirs is it requires exact filenames.
So no way to match:
IMG001.JPG to:
Copy of IMG001.JPG, or IMG001(1).JPG, etc.
As well, sync dirs requires the same directory structure. No way to compare exactly named files that exist in a subfolder
H:\FOO\IMG001.JPG vs H:\BAR\IMG001.JPG
--- without manually changing paths for each panel for each possible comparison. Not really feasible for me.
But as I said, your plugin is still a big step forward, Lefteous!


Sir_SiLvA wrote:
Balderstrom wrote:For 100% matches you could just do a CRC or MD5 match without utilizing either of those.
That only works for 100% identical files - not for 3 files showing all the same picture not being recognized by any prg as showing the same thing...

@chrizoo: DITO.
It's easy to see that Balderstrom was talking about hashing the image data only, not the entire file. So the files don't need to be "100% identical" contrary to what you said.
-
Last edited by chrizoo on 2011-03-09, 17:45 UTC, edited 4 times in total.
Post Reply