@ghisler(Author)
I have a small question about duplicate find mechanism.
If i tick size and content (content only should also group by size first and process from group).
Do you group by size first and then compare by content all grouped files using an incremental mechanism in parallel ?
Duplicate finder question.
Moderators: Hacker, petermad, Stefan2, white
- ghisler(Author)
- Site Admin
- Posts: 50390
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: Duplicate finder question.
When you tick content, TC does group by size - even when size is not checked, because you can't have same content when the size is different.
If there are only 2 files, TC compares them directly. If there are more than 2 files, TC generates MD5 hashes of all files with equal size, and then compares the hashes. This is necessary because if there are 4 or more files, they may be matching pairwise or in multiple groups.
If there are only 2 files, TC compares them directly. If there are more than 2 files, TC generates MD5 hashes of all files with equal size, and then compares the hashes. This is necessary because if there are 4 or more files, they may be matching pairwise or in multiple groups.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: Duplicate finder question.
Many thanks for this point, i made a comparison over network and it was very slow to determine that many (12) big files had same size different contents even from the 100 first bytes.. This is why i was asking question about any specific heuristic like opening all files of same group and calculate a hash dynamically by small block to detect or separate in new sub group as soon as possible without reading complete files. (this occurs in any case when file are indentical)
The current way is OK on fast drive or small files but on network or with huge files it is faster to use fclones or jdupes (direcly on sever
).
The current way is OK on fast drive or small files but on network or with huge files it is faster to use fclones or jdupes (direcly on sever
