+[8.50b13] Search for dups with plugins causes crash

Bug reports will be moved here when the described bug has been fixed

Moderators: Hacker, petermad, Stefan2, white

User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

+[8.50b13] Search for dups with plugins causes crash

Post by *MVV »

Start a search for duplicates using plugin field [=tc.versionstring] (Alt+F7, Advanced, Find duplicate files, same plugin fields). May be reproduced with both 32- and 64-bit TC since 8.50b10.

It may not be reproduced with any dir, but I can reproduce it with C:\test dir with two default.bar files from TC installation:

Code: Select all

C:\test\0\default.bar
C:\test\1\default.bar
Exception text for both 32- and 64-bit TC:

Code: Select all

---------------------------
Total Commander 8.50b13
---------------------------
Access violation at address 006B25FE. Read of address 00000010.
Access violation at address 006B25FE. Read of address 00000010
Windows 7 SP1 6.1 (Build 7601)

Please report this error to the Author, with a description
of what you were doing when this error occurred!

Windows exception: C0000005
Stack trace:
006B25FE
58D4EA  58D64E  58E1A6  590D93  58C943  446630
4364BE  4464F9  447AD6  44642F  44880A  4464F9
448456  >42587C  447A0B  42587C  447979  42587C
447AD6  448456  42587C  447A0B  42587C  42AF38
42AFD4  586EAA  54F687  66098E  6527D7  448F93
447AD6  448456  42587C  447A0B  42587C  42AF38
42AFD4  6FFD11  
Raw:
58D503  58D4EA  58D64E  4C0053  43F1A7  4654DE
4655CC  42587C  43EFC3  43F102  44892E  44ADBF
4464F9  448725  447AD6  448456  448479  42587C
447A0B  42587C  4134FC  42AE8A  6FCC7F  6FCD7B
6FCFC2  58E1A6  41EA9E  4C0053  402249  4033D0
4021C8  6FCC19  6FCA74  6B222B  4020A2  6B37D5
417AFE  4176DD  6B397C  6B39B0  43F1A7  4654DE
4655CC  4C0053  43EFC3  43F102  44892E  44ADBF
4464F9  448725  447AD6  448456  448479  4C0053
4C0053  447A0B  42587C  4C0053  42AE8A  4023EF

Press Ctrl+C to copy this report!

Code: Select all

---------------------------
Total Commander 8.50b13
---------------------------
Access violation.
Access violation
Windows 7 SP1 6.1 (Build 7601)

Please report this error to the Author, with a description
of what you were doing when this error occurred!

Stack trace (x64):821918
5C7DA5 5C7FE3 5C953E 5CE130 5C68BF 7DE111 7F97F6 7F9F4D
7FA565 7F96E1 40F3FD 7DD0E0 7D1AB3 8C779F 80F810 8D0392

Press Ctrl+C to copy this report!
If I click No in 64-bit TC, it begins to switch focus between find dialog and main window in an infinite loop instead of closing so I need to kill process.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50541
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Confirmed, thanks. The problem is that [=tc.versionstring] returns a NULL pointer when a file doesn't contain version info.
Author of Total Commander
https://www.ghisler.com
User avatar
white
Power Member
Power Member
Posts: 5815
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

Confirmed and found same problem with [=tc.versionnr].
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50541
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Yes, it's the same problem (files without such fields). After fixing it, I found another problem: Files which do NOT contain such fields are now treated as the same. This isn't false, but probably not what the user expects...
Author of Total Commander
https://www.ghisler.com
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

ghisler,
When we search for duplicates, we set a number of conditions that differ files. If some test can't detect that files are different, it passes it to another test. E.g. if two files have same size, it is logical that comparing by size is helpless. Same thing with plugin fields: if two files have no such field, comparing by it is helpless, and user should specify more rules.
If you think that users won't expect this, you can add a notice to help file.
User avatar
white
Power Member
Power Member
Posts: 5815
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

MVV wrote:When we search for duplicates, we set a number of conditions that differ files. If some test can't detect that files are different, it passes it to another test. E.g. if two files have same size, it is logical that comparing by size is helpless. Same thing with plugin fields: if two files have no such field, comparing by it is helpless, and user should specify more rules.
I think it's more complicated than that. If some test can't detect that files are different, it can't be determined if files differ no matter of other tests. E.g. if it can't be determined if two files have the same size, they can't be considered equal if the user specified that files are equal when they have the same size, regardless of other tests.

Suppose you search for photos with same size and same EXIF field and you are going to delete the duplicates. Following your logic all photos without EXIF information will be considered equal and you may end up deleting photo's you did not want to delete. You may also have selected files which are not photos at all. You may end up deleting these as well.
ghisler(Author) wrote:Files which do NOT contain such fields are now treated as the same. This isn't false, but probably not what the user expects...
Actually, it is false. The value undefined does not equal another value of undefined. It's undefined whether these values are equal or not.

I think you should consider them different unless the user explicitly specified otherwise.
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

white,
If you set to search for duplicates by size, name and two plugin fields (A and B), and two files have same name, size and value of field A but both have no field B (e.g. it is a version field and both files are text files), will you think that files are different?

In my opinion, both answers "1.0.20.3" and both answers ft_nosuchfield (or ft_fieldempty) have same meaning: files have equal value of given attribute. Only if field value is ft_fileerror for at least one file, we can't say that files are equal.

For fastest compare, we should divide files into classes (e.g. using multimap-like structure, or just map with arrays) by all selected attributes: by name, then by size, then for ones from same class we should get plugin field (value or ft_nosuchfield or ft_fieldempty) and divide files into smaller classes by plugin field (if user have enabled compare by contents, file hash is the last attribute). When all fields are checked, resulting classes contain duplicates. These may be not really duplicates (binary) but they are duplicates according to compare conditions, and it is up to user to decide what to do with them.


E.g. we have following files (name, size, field A, field B):

Code: Select all

A(1)	10	a	9
B		11	f	6
C		10	g	54
A(2)	10	a	9
A(3)	4 	-	6
A(4)	4 	-	7
(here A(1) and A(2) have same name A, counter is added in order to distinguish them, '-' means ft_nosuchfield)

Compare steps:
1. Compare names first, get classes QA={A(1); A(2); A(3); A(4)}, QB={B}, QC={C}. We have only QA class with multiple items.
2. Compare sizes within QA class: QA10={A(1); A(2)}, QA4={A(3); A(4)}. We have two classes for next step.
3. Get field A for QA10 and QA4, then divide by field A: QA10a={A(1); A(2)}, QA4_={A(4)}, QA4w=A(3)}. Only QA10a class passes the step.
4. Get field B for QA10a, then divide by field B: QA10a9={A(1); A(2)}. So, A(1) and A(2) are duplicates according to search options. I repeat, files may still have different contents, but user have chosen this set of attributes to compare.

So, we only need to get every field once, and only for particular files, not all. It would be the best to sort fields by time required to get them, in order to get fastest ones first.
Technically all steps operate in same way, it may be a loop: take set of classes from previous step and produce new set of classes for next step. Key for maps is the same: binary array. We compare size first, then call memcmp.
User avatar
white
Power Member
Power Member
Posts: 5815
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

MVV wrote:white,
If you set to search for duplicates by size, name and two plugin fields (A and B), and two files have same name, size and value of field A but both have no field B (e.g. it is a version field and both files are text files), will you think that files are different?
I think it can not be determined whether they are equal (duplicates). Since all criteria must be met (ANDing), it has no relevance whatsoever how many criteria are used.
MVV wrote:In my opinion, both answers "1.0.20.3" and both answers ft_nosuchfield (or ft_fieldempty) have same meaning: files have equal value of given attribute. Only if field value is ft_fileerror for at least one file, we can't say that files are equal.
ft_nosuchfield (should not happen), ft_fieldempty (name is misleading) and ft_fileerror all mean the value can not be determined, so in all these cases we can't say that files are equal. In your example for fast compare, you yourself are not considering them to be equal.
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Using your logic, two identical files w/o version info (or w/o id3 tag) are not definitely duplicates if we use check by this field...
User avatar
white
Power Member
Power Member
Posts: 5815
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Post by *white »

MVV wrote:Using your logic, two identical files w/o version info (or w/o id3 tag) are not definitely duplicates if we use check by this field...
If you search for files with the same id3 tag, you are not searching for files without id3 tag (for example non audio files).

Also try normal searching using the plugin tab. Search for tc.versionstring !contains "ridiculousness". This will not find files without versionstring.
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

white wrote:If you search for files with the same id3 tag, you are not searching for files without id3 tag (for example non audio files).
Maybe there is a something in it.

But if I need to search for duplicates within executables, and I want to speed up process by using version info (or by using tags for audio files), I do it in a wrong way...
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50541
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Actually TC beta 13 was already ignoring files when ft_fieldempty was returned. The problem is when ft_string or ft_stringw is returned, but the string is empty. I think that I should ignore these too.
Author of Total Commander
https://www.ghisler.com
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50541
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

OK, please try again with beta 14!
Author of Total Commander
https://www.ghisler.com
User avatar
MVV
Power Member
Power Member
Posts: 8711
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Hm, it still causes crash in b14... I checked it twice, and with x64 too. Field is the same: [=tc.versionstring].
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50541
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

No more crash here, and I tested with 1000s of files - can you post the crash report, please?
Author of Total Commander
https://www.ghisler.com
Post Reply