Compare Does Not Detect Identical Strings

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
sqgl
Junior Member
Junior Member
Posts: 9
Joined: 2010-03-30, 21:40 UTC
Location: Australia
Contact:

Compare Does Not Detect Identical Strings

Post by *sqgl »

Two files I try to compare are:

Code: Select all

xxx
aaa xxx

Code: Select all

aaa yyy
Compare does not detect that aaa is common to both files. In fact it says they have nothing in common.

FYI If I remove the carriage return in the first file and have just one long string on a single line ("xxx aaa xxx") then compare works OK.
User avatar
Sir_SiLvA
Power Member
Power Member
Posts: 3292
Joined: 2003-05-06, 11:46 UTC

Post by *Sir_SiLvA »

because the files ARE not the same
you get the same with every compare tool
ie try WinMerge
Hoecker sie sind raus!
User avatar
Peter
Power Member
Power Member
Posts: 2064
Joined: 2003-11-13, 13:40 UTC
Location: Schweiz

Post by *Peter »

[Off topic:
ja, Sir_SiLvA - schön dich nach langer Absenz wieder zu lesen. Habe dich - und deine Menüanpassungen schon geraumer Zeit vermisst.

Hoffentlich alles OK - und welcome back!
Peter

End OFF Topic]
TC 10.xx / #266191
Win 10 x64
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

TC's Compare tool, like many other tools --- even GNU diff.exe, diff3.exe: use line comparing functions.

When the line count of a file is not the same, they will look for lines that are exact matches, and show/output the lines that are different.

Now with a larger file, one thing TC does allow you to do when you can visually see that the line-comparing algorithm has gotten confused --- Resync comparison from here: Select a line on each side and compare the files starting from that line instead.

In your 2 line usage case, that wont help --- as there is a line mismatch and there are only 2 lines to go from in the largest file.

Instead, try something like this:
File1.txt wrote:nothing here
xxx
aaa xxx
foobar
File2.txt wrote:nothing here
aaa yyy
foobar
Then select line 3 of File1, and line 2 of File2. Right Click and Resync Comparison... You will force the Compare to start from those lines, and will see the difference is the xxx/yyy piece.
User avatar
sqgl
Junior Member
Junior Member
Posts: 9
Joined: 2010-03-30, 21:40 UTC
Location: Australia
Contact:

Post by *sqgl »

Sir_SiLvA I merely whittled away at my two large problem files to make the problem very clear.

Thanks Balderstrom you have shown me a workaround I was not aware of.

Consider the following two files.

Code: Select all

Spanner in the works.
This line is almost identical in both files.

Code: Select all

This line is almost identical in both files
The extra line and the full stop confuse TC so that it thinks the two files have absolutely nothing in common.
Last edited by sqgl on 2011-03-20, 05:59 UTC, edited 1 time in total.
User avatar
Sir_SiLvA
Power Member
Power Member
Posts: 3292
Joined: 2003-05-06, 11:46 UTC

Post by *Sir_SiLvA »

sqgl wrote:The extra line and the full stop confuse TC so that it thinks the two files have absolutely nothing in common.
As said before... this is not only TC behavior but other compare tools like winmerge do it the same way...
Hoecker sie sind raus!
User avatar
sqgl
Junior Member
Junior Member
Posts: 9
Joined: 2010-03-30, 21:40 UTC
Location: Australia
Contact:

Post by *sqgl »

Sir_SiLvA wrote:this is not only TC behavior but other compare tools like winmerge do it the same way.
You are right, I just installed winmerge and it has the same problem.

What is curious is that if carriage-return is removed from the first file then the compare works great.

Even more curious is that if only the full stop is removed from the first file then the carriage-return is not an issue and the compare works great.

New algorithm anyone?
User avatar
Balderstrom
Power Member
Power Member
Posts: 2148
Joined: 2005-10-11, 10:10 UTC

Post by *Balderstrom »

As I mentioned, it is a LINE compare.
If you remove the carriage-return, then the two files each have one line, and the difference between the two will be "Spanner in the works."
*BLINK* TC9 Added WM_COPYDATA and WM_USER queries for scripting.
User avatar
sqgl
Junior Member
Junior Member
Posts: 9
Joined: 2010-03-30, 21:40 UTC
Location: Australia
Contact:

Post by *sqgl »

Balderstrom wrote:As I mentioned, it is a LINE compare.
If you remove the carriage-return, then the two files each have one line, and the difference between the two will be "Spanner in the works."
More specifically you said (and I'm sorry I missed it)
Balderstrom wrote:When the line count of a file is not the same, they will look for lines that are exact matches, and show/output the lines that are different.
So to improve matters it seems like the whole algorithm would need to be changed (perhaps making it too slow). Fair enough. Thanks for explaining. Maybe this thread needs to be moved to the sub-forum called "TC7.56a Behaviour which will not be changed"?
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48079
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Indeed TC cannot currently find similar lines, only identical lines. When there are enough identical lines, TC can align them correctly, so the comparison of the different lines works too. It would be nice to use some kind of similarity algorithm to find similar lines. Unfortunately it depends a lot on the file type and the contents what has to be considered similar...
Author of Total Commander
https://www.ghisler.com
Post Reply