Serious Data corruption bug in "Compare by content"

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Serious Data corruption bug in "Compare by content"

Post by *milo1012 »

Hello.
I'm using TC for years now and had no problems so far, but recently I found a serious bug.
When editing some large registry text files (for an installation tool) while comparing them at the same time with "Compare by content" tool, it obviously corrupts the opened files.
The key to the problem seems to be that the text files must be large (above 1MB).

Approach:
Use two more or less similar text files with sizes greater than 1MB (>1048576 Bytes, but I'm not sure about the exact limit)
It's important that they are real text files and get recognized as that in the CBC-Tool.
So HTML/Source Code files and similar with proper line endings work too, binary/data files don't seem to work. (I'm not sure though)
It doesn't seem to matter which encoding (UTF/ANSI) or line endings (CR+LF/ CR only) the files have.
Alternatively, get one large text file > 1MB, make a copy of it and change the copy slightly (delete/duplicate a few lines or add some lines somewhere in the file) so that you can compare them easily in "Compare by content" tool.
Now make a backup of the file you want to edit later.
So you now should have

file1.txt
file2.txt
file2.bak


or similar.

Compare file1.txt and file2.txt in CBC-Tool and leave the CBC-Window open.

Now edit file2.txt in an external editor program (Notepad or whatever).
Delete(not add or replace!) a few lines somewhere in the file (not at the end!), and save it.
If the editor doesn't lock the file when opened (e.g. Notepad) you could leave it open, but better close it nonetheless.
Get back to the CBC-Window and confirm the reload dialog.
You can already see that there's something wrong if you go to the end of the file.
But it's better to recognize if you now compare file2.txt and file2.bak
TC added the exact amount of data you just deleted in file2.txt.
So the overall file size doesn't change by that.
It is done by using the data reverse-counting from the end of the file.
So, for example, if you deleted 50 Bytes somewhere in file2.txt, TC copies the last 50 Bytes from the end of the file (probably from the copy in RAM before reloading) and appends it to the end of the file.
So now you basically have duplicate lines/words at the end of file2.txt.

As mentioned, with smaller files I hadn't noticed that behavior. (even 999k seems to be okay)

I confirmed the bug with 7.56a and latest 8.0 Beta 17a on WinXP SP3 and Windows 2000.
Also present in old 7.50a.
Older versions and 8.0 64bit not tested, also not Vista and Win7.
umbra
Power Member
Power Member
Posts: 871
Joined: 2012-01-14, 20:41 UTC

Post by *umbra »

Confirmed for 8.0x32 and 8.0x64. If the edited file has exactly 1MiB or more, it's corrupted. You just need to have it open in Compare window - you don't even need to recompare it, it is damaged the moment you save it in external editor.

However I tested it only with Notepad, so it might be its fault, because it is rewriting a locked file.
BTW, why does TC lock files that are being compared?

EDIT:
Looks like TC locks compared file only if it's larger than 1MiB. Any reason for that?
Windows 7 Pro x64, Windows 10 Pro x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I don't understand the problem, sorry. Doesn't TC request to re-compare the files when you modify them outside of the compare tool?

Btw, the difference between files <1MB and >1MB is that smaller files are loaded into memory, while larger files are just mapped into memory (view contents in place). So if you modify the mapped file outside of TC, you will see changes immediately.
Author of Total Commander
https://www.ghisler.com
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Christian, I think he tells that TC doesn't update file size and saves it with old size (which is larger than new one).
umbra
Power Member
Power Member
Posts: 871
Joined: 2012-01-14, 20:41 UTC

Post by *umbra »

For example you have a large text file with the following content (just an illustration):

Code: Select all

0123456789
If you compare it with a different file, TC will map the large file into memory and Windows (probably) will lock that file, so other apps cannot change it. However Notepad ignores that and when you change the file's content to
23456789
and save it, size of the file will remain constant and its real content will become
2345678989
It doesn't matter, if the compare tool offers to recompare the file, it is damaged the moment you modify it with a Notepad.
As I said, this might be Notepad's fault, because it should not ignore the fact, that the file is locked. However if TC would not lock the compared file, it would solve the problem too.
Windows 7 Pro x64, Windows 10 Pro x64
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

I see - I will check whether this size change can be detected somehow.
Author of Total Commander
https://www.ghisler.com
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

size of the file will remain constant and its real content will become
2345678989
Exactly, thats the point. Maybe I explained it a bit too complicated.
If you compare it with a different file, TC will map the large file into memory and Windows (probably) will lock that file, so other apps cannot change it. However Notepad ignores that
I doubt that locking has much to do with that. It doesn't matter which program you use to edit the file, I used a dozen different with the same results. On modern NT systems there are a few ways to lock a file but it most cases it remains writable. I think there's just a file handle open, but as long as you don't try to write at the same time it shouldn't be a problem. Most modern programs do so, even Visual Studio opens the source files and you can overwrite it with an external program, it only gets complicated when you try to delete the file. TC's CBC is okay with external editing from my understanding, otherwise it wouldn't watch for changes ans even ask for reload if it wouldn't be okay.
TC just needs to check for size change/file size, as already mentioned. I think it's just a boundary problem here, since it doesn't happen when the size is increased.
But anyway, it's critical. When this happens to a large script or source file, it gets messed up nevertheless and I get compiler errors.
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

It is unpleasant, but not critical since no data loss happens. :wink:
umbra
Power Member
Power Member
Posts: 871
Joined: 2012-01-14, 20:41 UTC

Post by *umbra »

milo1012 wrote:It doesn't matter which program you use to edit the file, I used a dozen different with the same results.
That is interesting because I tried several apps too and Notepad was the only one, that allowed me to modify a file while it was mapped into memory by Compare by Content. All other editors refused to save it, because it was opened by TC.
milo1012 wrote:even Visual Studio opens the source files and you can overwrite it with an external program
Yes and usually TC works the same, but in this case TC locks that file for itself.
Windows 7 Pro x64, Windows 10 Pro x64
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

umbra wrote:All other editors refused to save it, because it was opened by TC.
Notepad2 works, BreakPoint Hex Workshop, HxD. Notepad++ and Wordpad complain. As i said...every program handles file and write operation and watches for file changes differently.
MVV wrote:It is unpleasant, but not critical since no data loss happens. :wink:
That depends on your point of view...if data integrity is your top priority then it's bad. Of course...for standard text files you can identify the section easily and delete it, but try this on complicated scripts or nested html/xml files.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48021
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

OK, I have made some tests now. To my surprise, the error is not in Total Commander. Try this:

1. Compare the two files, at least one > 1 MB
2. Edit the file >1 MB in notepad: delete some lines, save
3. Re-open the file in notepad or any other editor WITHOUT returning to Total Commander
-> The extra lines at the end are already there!

I guess that the following happens: Notepad writes the data and then tries to truncate the file. However, Windows doesn't allow it because the file is mapped into TC, and therefore it cannot become smaller than the mapped size. Otherwise there would be an access violation when TC tried to access the file map beyond the mapped size.
Author of Total Commander
https://www.ghisler.com
umbra
Power Member
Power Member
Posts: 871
Joined: 2012-01-14, 20:41 UTC

Post by *umbra »

2ghisler
I agree, see my second post. From my point of view, Notepad should not allow you to modify a file, that is actively used by another application, without any warning.

edit:
Maybe in this case, it would be better to lock mapped file explicitly. It couldn't be modified externally, but at least it would be safe.
Windows 7 Pro x64, Windows 10 Pro x64
User avatar
MVV
Power Member
Power Member
Posts: 8702
Joined: 2008-08-03, 12:51 UTC
Location: Russian Federation

Post by *MVV »

Just checked it now with EmEditor, same thing, it is able to save a file, file size may be increased but not decreased. So it seems that file is not locked.

ghisler(Author), which sharing flags do you use in CreateFile? AFAIK Windows shouldn't allow to write to a file if you don't use FILE_SHARE_WRITE flag (you should specify only FILE_SHARE_READ one for comparing).
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Interesting. Seems I was wrong. Same behavior in JuffEd, Notepad2 and the HexEditors as already mentioned.
While I would understand that the HexEditors don't care much for file safety, as most of them also allow direct disk editing, it's amazing how especially Notepad2 emulates the same behavior as standard Notepad.
So it's more or less a safety problem of the editing tool for not checking for proper writing rights (or at least not as thorough as other programs do)?

The question remains if TC should allow it or not. Otherwise it shouldn't even bother checking for file change if the file is locked anyway.
For example Notepad2 and most other editors also has a file watcher and asks for a reload the same way as TC does, but it works since the file is not locked by e.g. Notepad2 (it seems that also biggest files are loaded completely into RAM)

From my understanding, TC should lock the file only if the edit function is active in CBC and keep the files unlocked if not.
umbra
Power Member
Power Member
Posts: 871
Joined: 2012-01-14, 20:41 UTC

Post by *umbra »

2MVV
Try it with another editor: Notepad++ and WordPad will refuse to do so, because the file is open in another editor (you can get similar error, when you try to rename the file with Explorer). And VisualStudio will explicitly say, that it can't modify a memory mapped file.

2milo1012
Notepad2 is a "Notepad replacement" - when you install it, the first thing it shows is a question, whether it should install and replace Notepad. So it's no wonder it emulates Notepad as much as possible. :)
Windows 7 Pro x64, Windows 10 Pro x64
Post Reply