Not a bug. TC looks for inserted and deleted strings by default. If you want to look for differences at fixed positions, please click on the button "Sync: <number>" and switch to method 0: Compare at fixed positions (e.g. for tables).
Reason: Because of the repeated dots, TC sees C15C as an insertion and not as a replacement for the 4 dots in that location. If those 4 dots were something else, e.g. four dashes, it would see it as a replacement.
Thanks, I didn't know about the Sync option. Total Commander remembers this setting for subsequent file comparisons, which is perfect!
I imagine that the default method 1 could potentially make a better assessment about whether something is a string insertion or replacement -- after all, the length of the line is the same, and the replacement is followed by the identical array of characters. I don't know how feasible or worth pursuing that would be, though.
TC uses a search for the smallest square distance to find the next match:
First it compares the next character of each line with each other. If they match, the distance is zero.
Then it compares the next character in the left string with the next+1 character in the right, and the next+1 character in the left string with the next character in the right. If there is a match, skip the one not matching character.
Then it compares the next character in the left string with the next+2 character in the right, the next+1 character in the left string with the next+1 character in the right, and the next+2 character in the left string with the next character in the right. If there is a match, use the one with the shortest distance.
This goes on until a match is found.
Since the 4 new characters replace dots, but there are also dots behind, TC eventually finds 2 matching dots at positions +4 and +0. This is detected as a 4 character insertion.
Unfortunately the current algorithm can only find insertions, and it can only find replacements when the characters behind the replacement are different from the replaced characters. Example:
C15C ... .... ...
will match the red dots and will be detected as an insertion.
C15C ...
---- ...
will match the green dots and will be detected as a replacement.
This comparison is all done 'on the fly' each time a line is rendered. To get more detailed results, TC would have to consider all possible matches and then use the one with the least number of differences, which could take a long time to determine even for a single line.
ghisler(Author) wrote: 2025-01-20, 08:40 UTC
This comparison is all done 'on the fly' each time a line is rendered. To get more detailed results, TC would have to consider all possible matches and then use the one with the least number of differences, which could take a long time to determine even for a single line.
For me, the time to compare is not important.
Getting accurate results are the goal for any comparison.
The Sync:0 setting seems to make this much better.
As said above, 2 very popular compare tools have no problems with these files.
Windows 11 Home, Version 24H2 (OS Build 26100.3476) TC 11.51 x64 / x86
Everything 1.5.0.1391a (x64), Everything Toolbar 1.5.2.0, Listary Pro 6.3.2.88
QAP 11.6.4.2.1 x64
Horst.Epp wrote: 2025-01-20, 11:19 UTC
For me, the time to compare is not important.
Getting accurate results are the goal for any comparison.
The Sync:0 setting seems to make this much better.
So is it always better Sync:0 to get accurate results ?
ghisler(Author) wrote: 2025-01-20, 08:40 UTCTo get more detailed results, TC would have to consider all possible matches and then use the one with the least number of differences, which could take a long time to determine even for a single line.
Uh, that's complicated. If you do something simple like use method 1 AND method 0 for comparing lines with equal lengths and then pick the one with less differences, would that explode performance? Method 0 seems to be very fast and if you limit that overhead to comparing lines with equal lengths and at least 2 differences found using method 1, that shouldn't be bad and would gain much better results.
Galizza wrote: 2025-01-20, 13:02 UTCSo is it always better Sync:0 to get accurate results ?
No, Sync:0 will fall apart when it comes across any insertions, so it's better suited for things with a defined structure, such as tables.
If you do something simple like use method 1 AND method 0 for comparing lines with equal lengths and then pick the one with less differences, would that explode performance?
That wouldn't help in most cases. Normally you don't have fixed line lengths, so method 0 is only useful for one very specific case (fixed width tables).
Every single character would have to be checked individually whether it could be an insertion or a replacement, which would make the compare time increase exponentially with every single character. This just isn't feasible. For example, method 0 falls apart when the 4 dots aren't replaced with 4 characters, but with 3 or 5 characters.
Okay, I take it that the current implementation makes it impossible to bring file comparisons to be on par with the diff programs mentioned above. No biggie, I'll just keep switching the methods or use a third-party software if that proves to be too annoying.
Edit for clarification:
ghisler(Author) wrote: 2025-01-21, 08:50 UTCEvery single character would have to be checked individually whether it could be an insertion or a replacement, which would make the compare time increase exponentially with every single character.
I meant doing method 1 for a line, then re-doing the comparison for a line using method 0, then picking the one with less differences (i.e. not 2 methods at the same time combined), that's not exponential.
Currently I'm drawing the text while comparing, e.g. it finds a block which is identical, then it puts that out and looks for the next block. Your suggestion would require to somehow save the compare results, compare them with compare results of other methods, pick one, and then display the results. This would require a lot of changes to the compare tool, nothing I could do in a few weeks.
Moderator message from: ghisler(Author) » 2025-01-22, 08:27 UTC
Yet, shouldn'it be a "perfect" compare when you do a "binary" compare ?
In that case every single mismatch should be counted as a difference, even for a single byte.
I have two files of 17MB in size, identical in everything but for a single byte. I would have missed it if I just looked at the "differences found" label.