[8.01] Can't compare by content

Please report only one bug per message!

Moderators: white, Hacker, petermad, Stefan2

siealex
Senior Member
Senior Member
Posts: 278
Joined: 2009-03-22, 16:36 UTC

[8.01] Can't compare by content

Post by *siealex »

I have two files in a folder:
backup.ab (239995829 b),
backup1.ab (239995827 b).
When I try to compare them, TC throws:
ERROR: Cannot read d:\android\backup1.ab. The files are DIFFERENT!
and does not show the contents. Both files can be read by Lister. What's this?
Image: http://s11.postimage.org/lb49l4jab/tc_bug.jpg
8.01 final (also 8.01RC5), WinXP SP2 x32.
We are not so S.M.A.R.T. as we imagine...
User avatar
karlchen
Power Member
Power Member
Posts: 4601
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

Hello, siealex.

Abstract
Problem reproduced.
All 32-bit versions starting with 7.57a are affected.


Details:

Used the following T.C. versions:
+ T.C. 8.01 32-bit
+ T.C. 8.00 32-bit
+ T.C. 7.57a 32-bit
+ T.C. 7.56a 32-bit
and compared large binary files by content with them.
Large means: from 250 MB (roundabout the sizes of your samples) up to 1600 MB (largest ISO at hand here).

Result:

The (incorrect) error message which we see is caused by two factors:
+ the sizes of the files to be compared
+ the fact whether they are identical or not

Provided the files are identical T.C. 32-bit seems to be able to compare files which are about 1600 MB without any problems and state correctly that they are identical.
Provided the files are not identical T.C. 32-bit will not be able to display both files side by side because it cannot allocate enough RAM, even if the sum of both filesizes is much less than 1024 M.
T.C. 7.56a simply says so:
Not enough RAM available. Operation aborted. Files are different.
T.C. 7.57a, T.C. 8.00 and T.C. 8.01 will incorrectly state that one of the files cannot be read. - We may safely assume that the true reason for aborting the operation, however, is what T.C. 7.56a tells us.
Furthermore, we may safely assume that this misbehaviour must have been introduced in T.C. 8.0 and backported to T.C. 7.57a.


Summary
Problem reproduced.
All 32-bit versions starting with 7.57a are affected. (In fact, I assume 7.57 is affected, too, but I cannot be bothered to re-install it and try.)

Cheers,
Karl
Last edited by karlchen on 2012-08-04, 13:57 UTC, edited 4 times in total.
MX Linux 21.3 64-bit xfce, Total Commander 10.52 64-bit
The people of Alderaan keep on bravely fighting back the clone warriors sent out by the unscrupulous Sith Lord Palpatine.
The Prophet's Song
siealex
Senior Member
Senior Member
Posts: 278
Joined: 2009-03-22, 16:36 UTC

Post by *siealex »

When I compare backup1.ab with its copy in another folder, TC says they are identical and sometimes immediately throws the same error message.
We are not so S.M.A.R.T. as we imagine...
User avatar
karlchen
Power Member
Power Member
Posts: 4601
Joined: 2003-02-06, 22:23 UTC
Location: Germany

Post by *karlchen »

Hm. Your two files cannot be identical. Different filesizes.
Have not experienced this here.

If two files are identical, all 4 32-bit editions of T.C. will say so.
If they are not identical, T.C. will abort the operation for my sample files because it cannot allocate enough RAM to display the pair of files side by side.
Only T.C. 7.56a will give the correct reason for aborting, the other 3 versions display the incorrect error message that one of the files could not be read.

Karl
MX Linux 21.3 64-bit xfce, Total Commander 10.52 64-bit
The people of Alderaan keep on bravely fighting back the clone warriors sent out by the unscrupulous Sith Lord Palpatine.
The Prophet's Song
siealex
Senior Member
Senior Member
Posts: 278
Joined: 2009-03-22, 16:36 UTC

Post by *siealex »

Hm. Your two files cannot be identical. Different filesizes.
Yes, these files are different, but if I copy e g backup1.ab to backup2.ab and compare them, sometimes I get the same message.
We are not so S.M.A.R.T. as we imagine...
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48077
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

TC cannot open one of the files because they are too big. TC maps the entire files into memory, and this doesn't work if the memory is too fragmented. Try directly after a reboot, it may work then.
Author of Total Commander
https://www.ghisler.com
User avatar
Valentino
Power Member
Power Member
Posts: 706
Joined: 2003-02-07, 00:21 UTC
Location: Ukraine

Post by *Valentino »

Maybe fix error messages to avoid confusing? There is a Win API FormatMessage to convert last error code into a readable message.
Zom-B
Junior Member
Junior Member
Posts: 36
Joined: 2006-11-20, 17:46 UTC

Post by *Zom-B »

Reproduced with TC 8.01rc4

I'd like to see this fixed/corrected.

Background:
I downloaded a file of 4GB with torrent, after completion I noticed corruption (crc mismatch). I added the torrent again and did a force recheck, to discover only two chunks were incomplete. I made a backup and completed the download of the original file. Now I want to check if the entire chunks were missing (all zeros, ie. incomplete download after all) or if it was a bit error (hard disk problem). Surely both files won't fit in my 6GB ram even after a clean boot.
User avatar
MarcinW
Power Member
Power Member
Posts: 852
Joined: 2012-01-23, 15:58 UTC
Location: Poland

Post by *MarcinW »

@ghisler(Author):
TC maps the entire files into memory, and this doesn't work if the memory is too fragmented.
I checked with a debugger that TC uses CreateFile. You can try using CreateFileMapping + MapViewOfFile instead - this could help.

Memory mapped file works like the swap file - we can map 1GB file (MEM_RESERVE) even if we have only 100MB of free physical memory (MEM_COMMIT). Windows "slides" this 100MB of physical memory along 1GB of virtual memory during accessing this memory. It is invisible to the user/programmer - all we must do is use normal pointer to the memory, returned by MapViewOfFile.

Of course there is a limitation - for 32-bit systems process can use max 2GB of address space (this can be adjusted, i.e. /3GB switch in the boot.ini file).

Regards
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48077
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

TC already uses CreateFileMapping if the file is > 1 MByte. CreateFileMapping takes a file handle from CreateFile as the first parameter.
Windows "slides" this 100MB of physical memory along 1GB of virtual memory during accessing this memory.
This isn't true, it depends on how you call MapViewOfFile. TC requests a pointer to the entire file, so it needs a block of addresses for the entire file.
Author of Total Commander
https://www.ghisler.com
User avatar
MarcinW
Power Member
Power Member
Posts: 852
Joined: 2012-01-23, 15:58 UTC
Location: Poland

Post by *MarcinW »

Yes, you are right. Maybe I wasn't clear enough. If we want to create a 1GB file mapping, we must have at least 1GB of contiguous address space (virtual memory). So if we have any DLL (or memory block) mapped in the middle of this 1GB, we can map only file of max 0.5GB size. Free physical memory can be however much lower, i.e. 100MB.

Because TC uses file mappings, the only way to improve file comparing would be to do partial instead of entire mappings.

Regards
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48077
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Because TC uses file mappings, the only way to improve file comparing would be to do partial instead of entire mappings.
Yes, this is true - but it would make the compare process much more complex, and therefore slower.
Author of Total Commander
https://www.ghisler.com
User avatar
MarcinW
Power Member
Power Member
Posts: 852
Joined: 2012-01-23, 15:58 UTC
Location: Poland

Post by *MarcinW »

Files become larger and larger, it's nothing unusual to compare two 1GB log files, so maybe the right solution (someday in the future) would be like this:

a) both files have been successfully mapped - the current, existing code does its job,
b) one or both files cannot be mapped entirely - mapping them partially (slower, but working).

By using some kind of trick, the second version may be as fast as the first. It's enough to map a file partially and remove access rights from the beginning and from the end of the mapped buffer. Provided that we use a sequential access to the the mapped memory, we can detect buffer underflow by handling EAccessViolation exceptions. Exceptions are slow, but reading files is much slower, so exceptions haven't impact on an overall performance. The rest of the code may remain as fast as for entirely mapped files.

Note: At the beginning of the file we should protect only the end of the mapped buffer. At the end of the file we should protect only the beginning of the mapped buffer. This isn't shown in the piece of code below (an example for one file). Tested with Win98, Win2000 and WinXP.

Code: Select all

procedure TForm1.Button1Click(Sender: TObject);
var
  SystemInfo : TSystemInfo;
  BufferSize : DWord;
  HF : THandle;
  HM : THandle;
  P : Pointer;
  FileOffsetHi : DWord;
  FileOffsetLo : DWord;
  OldProtect : DWord;
begin
  GetSystemInfo(SystemInfo);
  BufferSize:=10*SystemInfo.dwAllocationGranularity; {Must be <= than our file size}

  HF:=CreateFile('test.bin',GENERIC_READ,0,nil,OPEN_EXISTING,0,0);
  HM:=CreateFileMapping(HF,nil,PAGE_READONLY,0,BufferSize,nil);
  P:=nil;

  FileOffsetHi:=0;
  FileOffsetLo:=0;

  while True do {We remap the file and try again in case of EAccessViolation}
  try
    P:=MapViewOfFile(HM,FILE_MAP_READ,FileOffsetHi,FileOffsetLo,BufferSize);

    VirtualProtect(P,SystemInfo.dwPageSize,PAGE_NOACCESS,@OldProtect);
    VirtualProtect(Pointer(DWord(P)+BufferSize-SystemInfo.dwPageSize),SystemInfo.dwPageSize,PAGE_NOACCESS,@OldProtect);


    {Usage of the memory mapped file here}
    {Accessing bytes before of after mapped buffer (bytes with
     PAGE_NOACCESS protection) will raise EAccessViolation}


    Exit; {We finished without EAccessViolation}
  except
    on EAccessViolation do
    begin
      if P <> nil then
      begin
        VirtualProtect(P,SystemInfo.dwPageSize,PAGE_READONLY,@OldProtect); {For Win9x}
        VirtualProtect(Pointer(DWord(P)+BufferSize-SystemInfo.dwPageSize),SystemInfo.dwPageSize,PAGE_READONLY,@OldProtect); {For Win9x}
        UnmapViewOfFile(P);
      end;

      {We should map another piece of our large file and try again}
      FileOffsetHi:=0; {Some proper file offset here}
      FileOffsetLo:=0; {Some proper file offset here}

      Continue; {Try again}
    end else
      raise;
  end;
end;
We finished without EAccessViolation
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 48077
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Post by *ghisler(Author) »

Why would that help? TC would still need 4 GB of address space to compare two 2 GB files.
Author of Total Commander
https://www.ghisler.com
BeckYang
Junior Member
Junior Member
Posts: 29
Joined: 2006-04-02, 10:33 UTC

Post by *BeckYang »

Currently, this function display all content in compare result.
If the file size large then 50MB (for example, I hope it could be configure...)
Display only "different line/binary part" is a possible solution.

It would be great if there is another parameter could control the maxima count of "different line/binary part".
Sometime, we just want to know whether the two files are different, but not care about what's differnet...

Btw, I also have another suggestion.
Add a new feature for saving the differnt part, it really useful~ :idea:
Post Reply