Compare by content bug

The behaviour described in the bug report is either by design, or would be far too complex/time-consuming to be changed

Moderators: petermad, Stefan2, white, Hacker

Post Reply
_m_
Junior Member
Junior Member
Posts: 4
Joined: 2025-01-18, 12:13 UTC

Compare by content bug

Post by *_m_ »

1. Create 2 text files with the following contents:

file1.txt:

Code: Select all

|     7 | ....  ... ..  .. .... ... | C15C  ... ..  .. .... ... | ....  ... ..  11 .... 8A6 | .... .. ... . |
file2.txt:

Code: Select all

|     7 | ....  ... ..  .. .... ... | ....  ... ..  .. .... ... | ....  ... ..  11 .... 8A6 | .... .. ... . |
2. Select the files, open the File menu, choose "Compare By Content...".
3. Result: https://i.imgur.com/exo2Vo0.png

Total Commander incorrectly marks some characters as being different even though they're the same.

Tested in: Total Commander Version 11.50 64 bit (2025-01-02).
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6867
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: Compare by content bug

Post by *Horst.Epp »

Confirmed.
Beyond Compare and WinMerge makes it better.
Windows 11 Home, Version 24H2 (OS Build 26100.3476)
TC 11.51 x64 / x86
Everything 1.5.0.1391a (x64), Everything Toolbar 1.5.2.0, Listary Pro 6.3.2.88
QAP 11.6.4.2.1 x64
User avatar
chandragor
Member
Member
Posts: 127
Joined: 2005-06-01, 10:10 UTC
Location: Italy

Re: Compare by content bug

Post by *chandragor »

Confirmed too.
TC 11.03 x64
Happy owner of license #12422 since 1997
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50254
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Compare by content bug

Post by *ghisler(Author) »

Not a bug. TC looks for inserted and deleted strings by default. If you want to look for differences at fixed positions, please click on the button "Sync: <number>" and switch to method 0: Compare at fixed positions (e.g. for tables).

Reason: Because of the repeated dots, TC sees C15C as an insertion and not as a replacement for the 4 dots in that location. If those 4 dots were something else, e.g. four dashes, it would see it as a replacement.
Author of Total Commander
https://www.ghisler.com
_m_
Junior Member
Junior Member
Posts: 4
Joined: 2025-01-18, 12:13 UTC

Re: Compare by content bug

Post by *_m_ »

Thanks, I didn't know about the Sync option. Total Commander remembers this setting for subsequent file comparisons, which is perfect!

I imagine that the default method 1 could potentially make a better assessment about whether something is a string insertion or replacement -- after all, the length of the line is the same, and the replacement is followed by the identical array of characters. I don't know how feasible or worth pursuing that would be, though.

To make up for reporting not-a-bug, I noticed that at https://www.ghisler.com/resellers.htm, the link to http://www.totalcmd.anysoft.pl/ seems to be dead, the current one is https://www.anysoft.pl/total-commander-v.11.x
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50254
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Compare by content bug

Post by *ghisler(Author) »

TC uses a search for the smallest square distance to find the next match:
First it compares the next character of each line with each other. If they match, the distance is zero.
Then it compares the next character in the left string with the next+1 character in the right, and the next+1 character in the left string with the next character in the right. If there is a match, skip the one not matching character.
Then it compares the next character in the left string with the next+2 character in the right, the next+1 character in the left string with the next+1 character in the right, and the next+2 character in the left string with the next character in the right. If there is a match, use the one with the shortest distance.
This goes on until a match is found.

Since the 4 new characters replace dots, but there are also dots behind, TC eventually finds 2 matching dots at positions +4 and +0. This is detected as a 4 character insertion.

Unfortunately the current algorithm can only find insertions, and it can only find replacements when the characters behind the replacement are different from the replaced characters. Example:
C15C ...
.... ...
will match the red dots and will be detected as an insertion.
C15C ...
---- ...
will match the green dots and will be detected as a replacement.

This comparison is all done 'on the fly' each time a line is rendered. To get more detailed results, TC would have to consider all possible matches and then use the one with the least number of differences, which could take a long time to determine even for a single line.
Author of Total Commander
https://www.ghisler.com
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6867
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Re: Compare by content bug

Post by *Horst.Epp »

ghisler(Author) wrote: 2025-01-20, 08:40 UTC This comparison is all done 'on the fly' each time a line is rendered. To get more detailed results, TC would have to consider all possible matches and then use the one with the least number of differences, which could take a long time to determine even for a single line.
For me, the time to compare is not important.
Getting accurate results are the goal for any comparison.
The Sync:0 setting seems to make this much better.

As said above, 2 very popular compare tools have no problems with these files.
Windows 11 Home, Version 24H2 (OS Build 26100.3476)
TC 11.51 x64 / x86
Everything 1.5.0.1391a (x64), Everything Toolbar 1.5.2.0, Listary Pro 6.3.2.88
QAP 11.6.4.2.1 x64
Galizza
Member
Member
Posts: 191
Joined: 2018-09-07, 05:21 UTC

Re: Compare by content bug

Post by *Galizza »

Horst.Epp wrote: 2025-01-20, 11:19 UTC For me, the time to compare is not important.
Getting accurate results are the goal for any comparison.
The Sync:0 setting seems to make this much better.
So is it always better Sync:0 to get accurate results ?
_m_
Junior Member
Junior Member
Posts: 4
Joined: 2025-01-18, 12:13 UTC

Re: Compare by content bug

Post by *_m_ »

ghisler(Author) wrote: 2025-01-20, 08:40 UTCTo get more detailed results, TC would have to consider all possible matches and then use the one with the least number of differences, which could take a long time to determine even for a single line.
Uh, that's complicated. If you do something simple like use method 1 AND method 0 for comparing lines with equal lengths and then pick the one with less differences, would that explode performance? Method 0 seems to be very fast and if you limit that overhead to comparing lines with equal lengths and at least 2 differences found using method 1, that shouldn't be bad and would gain much better results.

Galizza wrote: 2025-01-20, 13:02 UTCSo is it always better Sync:0 to get accurate results ?
No, Sync:0 will fall apart when it comes across any insertions, so it's better suited for things with a defined structure, such as tables.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50254
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Compare by content bug

Post by *ghisler(Author) »

If you do something simple like use method 1 AND method 0 for comparing lines with equal lengths and then pick the one with less differences, would that explode performance?
That wouldn't help in most cases. Normally you don't have fixed line lengths, so method 0 is only useful for one very specific case (fixed width tables).
Every single character would have to be checked individually whether it could be an insertion or a replacement, which would make the compare time increase exponentially with every single character. This just isn't feasible. For example, method 0 falls apart when the 4 dots aren't replaced with 4 characters, but with 3 or 5 characters.
Author of Total Commander
https://www.ghisler.com
_m_
Junior Member
Junior Member
Posts: 4
Joined: 2025-01-18, 12:13 UTC

Re: Compare by content bug

Post by *_m_ »

Okay, I take it that the current implementation makes it impossible to bring file comparisons to be on par with the diff programs mentioned above. No biggie, I'll just keep switching the methods or use a third-party software if that proves to be too annoying.

Edit for clarification:
ghisler(Author) wrote: 2025-01-21, 08:50 UTCEvery single character would have to be checked individually whether it could be an insertion or a replacement, which would make the compare time increase exponentially with every single character.
I meant doing method 1 for a line, then re-doing the comparison for a line using method 0, then picking the one with less differences (i.e. not 2 methods at the same time combined), that's not exponential.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50254
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Compare by content bug

Post by *ghisler(Author) »

Currently I'm drawing the text while comparing, e.g. it finds a block which is identical, then it puts that out and looks for the next block. Your suggestion would require to somehow save the compare results, compare them with compare results of other methods, pick one, and then display the results. This would require a lot of changes to the compare tool, nothing I could do in a few weeks.

Moderator message from: ghisler(Author) » 2025-01-22, 08:27 UTC

Moved to will not be changed
Author of Total Commander
https://www.ghisler.com
User avatar
chandragor
Member
Member
Posts: 127
Joined: 2005-06-01, 10:10 UTC
Location: Italy

Re: Compare by content bug

Post by *chandragor »

Yet, shouldn'it be a "perfect" compare when you do a "binary" compare ?
In that case every single mismatch should be counted as a difference, even for a single byte.
I have two files of 17MB in size, identical in everything but for a single byte. I would have missed it if I just looked at the "differences found" label.
Happy owner of license #12422 since 1997
Post Reply