Decoding mhtml files

English support forum

Moderators: white, Hacker, petermad, Stefan2

Post Reply
damjang
Senior Member
Senior Member
Posts: 215
Joined: 2003-10-09, 15:58 UTC
Contact:

Decoding mhtml files

Post by *damjang »

I have some mhtml files created with chrome on android (with option without connection chrome create this file that contain all files and images of one web page). But I'm unable to decode this files to save some png files in it. I try the mht plugin (MhtUnPack), but decode only some files and not the png images. I also try to use tc mime decode that decode all files but lose file names and png files are not usable/viewable (decode error).

Any help with this task?

Here an example mhtml file:
https://mega.nz/#!0tIiTLST!pALswG0cd1hv07Dd8Z7cVCFAhjsUNWCFeFTy8Q8IEdk


Thank you
User avatar
Gral
Power Member
Power Member
Posts: 1460
Joined: 2005-01-26, 15:12 UTC

Post by *Gral »

PNG files are not encoded here, they are placed in binary form.
You can "recover" them using the appropriate program, eg. hex editor.
Just search for PNG header

Code: Select all

‰PNG
or better HEX:

Code: Select all

89504E47
damjang
Senior Member
Senior Member
Posts: 215
Joined: 2003-10-09, 15:58 UTC
Contact:

Post by *damjang »

Ah, ok, you're right. But is there an "automatic" tool/plugin/program to be used to do this?
User avatar
Gral
Power Member
Power Member
Posts: 1460
Joined: 2005-01-26, 15:12 UTC

Post by *Gral »

You can try with GSplit.
Split on PNG header occurence, without GSplit (additional) header.
You can also use WinHex, but with recent trial version file size is limited to 200kb.
User avatar
ZoSTeR
Power Member
Power Member
Posts: 1008
Joined: 2004-07-29, 11:00 UTC

Post by *ZoSTeR »

I tried a couple of MHT decoders including some TC plugins but none of them where able to extract the PNG files in your MHT file.

So I wrote a little PowerShell script:

Code: Select all

$mhtFile = "c:\Temp\MHT\File1.mht"
$outputFolder = "c:\Temp\MHT"

# Note: Codepage 28591 returns a 1-to-1 char to byte mapping
$Encoding = [Text.Encoding]::GetEncoding(28591)

$streamIn = [System.IO.StreamReader]::new($mhtFile, $Encoding)
$BinaryString = $streamIn.ReadToEnd()
$streamIn.Close()

$PNGRegex = [Regex] '\x89\x50\x4E\x47'
#\x89\x50\x4E\x47 = xPNG is the file header

$PNGENDRegex = [Regex] '\x49\x45\x4e\x44'
#'\x49\x45\x4e\x44' = IEND is the end of the file plus 4 bytes

$PNGMatches = $PNGRegex.Matches($BinaryString)
$PNGENDMatches = $PNGENDRegex.Matches($BinaryString)

$MatchCount = $PNGMatches.Count
Write-Output "Total number of matches: $MatchCount"

foreach ($counter in (0..($PNGMatches.Count -1)))
{
    Write-Output $counter
    Write-Output $PNGMatches[$counter].Index
    Write-Output $PNGENDMatches[$counter].Index
    $start = $PNGMatches[$counter].Index
    $len   = ($PNGENDMatches[$counter].Index - $PNGMatches[$counter].Index) + 8
    $tmpPNG = $BinaryString.Substring($start, $len)
    $streamOut = [System.IO.StreamWriter]::new("$outputFolder\ExtractedPNGs_$counter.png", $false, $Encoding)
    $streamOut.Write($tmpPNG)
    $streamOut.Close()
}
User avatar
j7n
Member
Member
Posts: 168
Joined: 2005-08-07, 21:56 UTC

Post by *j7n »

You can use some ripping software designed for games.

Jaeder Naub V2.0.1 worked for me, as did WinHex (Disk Tools -> File Recovery By Type). Both extracted 44 images. The program is rather confusing. Select PNG format in ripping options, Load the packed file, then press Scan. Extracted files will appear in a subdirectory relative to where Naub is. The sorting of the filenames may need to be fixed by prepending the offsets with zeros.
#148174 Personal license
Running Total Commander v8.52a
damjang
Senior Member
Senior Member
Posts: 215
Joined: 2003-10-09, 15:58 UTC
Contact:

Post by *damjang »

ZoSTeR wrote:I tried a couple of MHT decoders including some TC plugins but none of them where able to extract the PNG files in your MHT file.

So I wrote a little PowerShell script:

Code: Select all

$mhtFile = ...
Thank you. I have tried to run the script on my win7 but receive an error:
Method invocation failed because [System.IO.StreamReader] doesn't contain a method named 'new'.
damjang
Senior Member
Senior Member
Posts: 215
Joined: 2003-10-09, 15:58 UTC
Contact:

Post by *damjang »

j7n wrote:You can use some ripping software designed for games.

Jaeder Naub V2.0.1 worked for me, as did WinHex (Disk Tools -> File Recovery By Type). Both extracted 44 images. The program is rather confusing. Select PNG format in ripping options, Load the packed file, then press Scan. Extracted files will appear in a subdirectory relative to where Naub is. The sorting of the filenames may need to be fixed by prepending the offsets with zeros.
Yess, work. Now I have to check if I can restore also the filenames of png files.
Post Reply