Decode files with data URI headers

Here you can propose new features, make suggestions etc.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Decode files with data URI headers

Post by *DrShark »

Currently Total Commander can decode base64-encoded (*.b64) files with following structure:

Code: Select all

MIME-Version: 1.0
Content-Type: application/octet-stream; name="filename.ext"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="filename.ext"

<encoded_data>
I suggest to decode also files with following structure:

Code: Select all

data:<mime/type>;base64,<encoded_data>
Details on such type of header in Wikipedia.

It will allow to decode resources usually stored in html code like that:

Code: Select all

<html><body><img src="_data" style="vertical-align: -3px;"></body></html>
When decoding, TC can use the name of .b64 file as a base name of decoded file, and the mime type for its extension (so if have filename.b64 with data:image/png;base64, header, we can name decoded file filename.image.png or filename.png.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Decode files with data URI headers

Post by *Usher »

2DrShark
Some remarks:
1. The version decoded by TC is a syntax for MIME multipart used in emails. Do you want to decode embedded data in html only or also in emails? Text in emails generated by stupid scripts is usually sent as 7-bit, which means you can't decode embedded data in a single step, you must first decode text/html part.
2. Embedded data may be also urlencoded and embedded into other file types, f.e. SVG pictures in CSS files. What about them?
3. There is a plugin eDecoder for 7-zip which allows to browse eml/nws/mht file as an archive. You can use it with Total7zip plugin for TC. it doesn't decode embedded data, but it seems to better decode emails (especially pictures) than TC. Did you tested it?
4. There may be also similar plugin for HTML, but I haven't searched for such a plugin. Or maybe you should ask the author of eDecoder to add another level of unpacking… See discussion on 7-zip forum https://sourceforge.net/p/sevenzip/discussion/45797/thread/8df7e14e/?limit=25#a348 and search for other topics about eDecoder on that forum.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: Decode files with data URI headers

Post by *DrShark »

2Usher
Usher wrote: 2019-05-11, 20:17 UTCDo you want to decode embedded data in html only or also in emails?
Well, I'd like to decode just *.b64 with data URI headers in addition to currently supported MIME multipart header. My use case: open .html in Lister as text, copy and save as .b64 the content of src parameter with encoded data - it will be a file in format:

Code: Select all

data:<mime/type>;base64,<encoded_data>
TC already can decode b64 data, so it just needs to handle a header which is different from already supported MIME multipart header. That simple case would be enough for me.

As for decoding data encoded in data URI format from whole html file, well, TC then will have to parse whole file to find encoded pieces there, then decode them and extract them on disk as separate decoded files. Only Christian Ghisler can tell whether it will fit current TC's decode feature. The same applies for "Embedded data [that] may be also urlencoded and embedded into other file types, f.e. SVG pictures in CSS files" - only Christian Chisler can tell whether it's possible to implement any of this to TC's Decode feature.
Usher wrote: 2019-05-11, 20:17 UTC4. There may be also similar plugin for HTML, ...
In theory, WCX urlData is supposed to decode resources from *.htm*, *.css, *.mht*, maybe others. But for my test html files it doesn't work (doesn't open them). It shows and allows to extract some resources form some of my mht files, but files extracted using a plugin are broken, maybe due to some bug in a plugin.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Decode files with data URI headers

Post by *Usher »

DrShark wrote: 2019-05-12, 14:04 UTCIn theory, WCX urlData is supposed to decode resources from *.htm*, *.css, *.mht*, maybe others. But for my test html files it doesn't work (doesn't open them).
This plugin works, but it's very limited:

1. It works only with registered extensions. By default there are only html and css extensions registered (no mht). In other cases it does nothing when you press Ctrl+PgDn, there isn't any autodetection. You must either register other file types or rename files when needed. See next points for more info.

2. It decodes only url syntax for both base64 and urlencoded data streams, f.e.:

Code: Select all

url()

url("" data-src-2x="/files/190/images/login/login-1-2x.jpg" height="468" width="1440" alt="">

<Image height="16" width="16"></Image>
4. It uses 404 as a value for following PackerCaps:
PK_CAPS_MULTIPLE 4 Archive can contain multiple files
PK_CAPS_OPTIONS 16 Has options dialog
PK_CAPS_SEARCHTEXT 128 Allow searching for text in archives created with this plugin
PK_CAPS_HIDE 256 Don't show packer icon, don't open with Enter but with Ctrl+PgDn
As you can see, there is autodetection missing:
PK_CAPS_BY_CONTENT 64 Detect archive type by content
Changing 404 to 468 in [PackerPlugins] section doesn't help, this capability must be supported by the plugin itself.

DrShark wrote: 2019-05-12, 14:04 UTCIt shows and allows to extract some resources form some of my mht files, but files extracted using a plugin are broken, maybe due to some bug in a plugin.
I haven't found MHT with urldata, so I can't tell you how it works. But, as I have already written, text/html part in MHT may be encoded with Quoted-Printable. In this case you should first decode MHT like eml using TC and only then you can open HTML part with urlData.
Andrzej P. Wozniak
Polish subforum moderator
User avatar
DrShark
Power Member
Power Member
Posts: 1872
Joined: 2006-11-03, 22:26 UTC
Location: Kyiv, 68/262
Contact:

Re: Decode files with data URI headers

Post by *DrShark »

Usher wrote: 2019-05-12, 16:17 UTC1. It works only with registered extensions. By default there are only html and css extensions registered (no mht). In other cases it does nothing when you press Ctrl+PgDn, there isn't any autodetection. You must either register other file types or rename files when needed.
Well, I use it via internal context menu item:

Code: Select all

[PackerPlugins]
html=404,C:\totalcmd\plugins\wcx\urlData\urlData.wcx
[Associations]
Filter16=*.htm *.html *.mht *.mhtml
Filter16_urlData=**html
Usher wrote: 2019-05-12, 16:17 UTCI haven't found MHT with urldata, so I can't tell you how it works. But, as I have already written, text/html part in MHT may be encoded with Quoted-Printable. In this case you should first decode MHT like eml using TC and only then you can open HTML part with urlData.
There's my sample mht file.
In it urlData plugin shows following files inside:
no_name00.png
no_name01.png
no_name02.png
no_name03.gif
no_name04.svg
All above extracted by plugin as invalid files of their respective formats.

Lets compare svg decoding by TC's Decode feature and a plugin. If to open the mht sample in Lister, it shows following:

Code: Select all

data:image/svg+xml;base64=
,PHN2ZyB3aWR0aD0iMjIiIGhlaWdodD0iMjIiIHhtbG5zPSJodHRwOi8vd3d3LnczLm9yZy8yMD=
AwL3N2ZyI+PHBhdGggZD0iTTExIDBDNC45MjMgMCAwIDQuOTIyIDAgMTFjMCA0Ljg2OCAzLjE0O=
SA4Ljk3OSA3LjUyMSAxMC40MzYuNTUuMDk3Ljc1Ni0uMjMzLjc1Ni0uNTIyIDAtLjI2Mi0uMDEz=
LTEuMTI4LS4wMTMtMi4wNDktMi43NjQuNTA5LTMuNDc5LS42NzQtMy42OTktMS4yOTMtLjEyNC0=
uMzE2LS42Ni0xLjI5Mi0xLjEyOC0xLjU1My0uMzg0LS4yMDYtLjkzNS0uNzE1LS4wMTMtLjcyOS=
44NjYtLjAxNCAxLjQ4NS43OTggMS42OTEgMS4xMjguOTkgMS42NjMgMi41NzEgMS4xOTYgMy4yM=
DQuOTA3LjA5Ni0uNzE1LjM4NS0xLjE5Ni43MDEtMS40NzEtMi40NDgtLjI3NS01LjAwNS0xLjIy=
NC01LjAwNS01LjQzMiAwLTEuMTk2LjQyNi0yLjE4NiAxLjEyNy0yLjk1Ni0uMTEtLjI3NS0uNDk=
1LTEuNDAyLjExLTIuOTE1IDAgMCAuOTIyLS4yODggMy4wMjUgMS4xMjguODgtLjI0OCAxLjgxNS=
0uMzcyIDIuNzUtLjM3MnMxLjg3LjEyNCAyLjc1LjM3MmMyLjEwNC0xLjQzIDMuMDI2LTEuMTI4I=
DMuMDI2LTEuMTI4LjYwNSAxLjUxMy4yMiAyLjY0LjExIDIuOTE1LjcuNzcgMS4xMjcgMS43NDcg=
MS4xMjcgMi45NTYgMCA0LjIyMi0yLjU3MSA1LjE1Ny01LjAxOSA1LjQzMi4zOTkuMzQzLjc0MyA=
xLjAwNC43NDMgMi4wMzUgMCAxLjQ3MS0uMDE0IDIuNjU0LS4wMTQgMy4wMjUgMCAuMjg5LjIwNi=
42MzIuNzU2LjUyMkMxOC44NTEgMTkuOTggMjIgMTUuODU0IDIyIDExYzAtNi4wNzgtNC45MjMtM=
TEtMTEtMTF6IiBmaWxsPSIjMUIxRjIzIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiLz48L3N2Zz4=3D
TC (after changing the header from data: to MIME) decodes that as:

Code: Select all

<svg width="22" height="22" xmlns="http://www.w3.org/2000/svg"><path d="M11 0C4.923 0 0 4.922 0 11c0 4.868 3.149 8.979 7.521 10.436.55.097.756-.233.756-.522 0-.262-.013-1.128-.013-2.049-2.764.509-3.479-.674-3.699-1.293-.124-.316-.66-1.292-1.128-1.553-.384-.206-.935-.715-.013-.729.866-.014 1.485.798 1.691 1.128.99 1.663 2.571 1.196 3.204.907.096-.715.385-1.196.701-1.471-2.448-.275-5.005-1.224-5.005-5.432 0-1.196.426-2.186 1.127-2.956-.11-.275-.495-1.402.11-2.915 0 0 .922-.288 3.025 1.128.88-.248 1.815-.372 2.75-.372s1.87.124 2.75.372c2.104-1.43 3.026-1.128 3.026-1.128.605 1.513.22 2.64.11 2.915.7.77 1.127 1.747 1.127 2.956 0 4.222-2.571 5.157-5.019 5.432.399.343.743 1.004.743 2.035 0 1.471-.014 2.654-.014 3.025 0 .289.206.632.756.522C18.851 19.98 22 15.854 22 11c0-6.078-4.923-11-11-11z" fill="#1B1F23" fill-rule="evenodd"/></svg>7
- the result is almost valid image (there's only extra "7" character.)

Plugin decodes it as:

Code: Select all

<svg width="22" height="22" xmlns="http://www.w3.org/200АЅНЩњ€шсБ…С Ѓђф‰4ДДЂБРёдИМЂАЂАЂРёдИИЂАЂДЕЊАЂРёаШаЂМёДРд‚г“s’rгS#гC3bгSRг“rгsSbТг#32гsSbТгS#"Тг#c"Тг2LKЊLЋKЊLЛL‹Њ
KL‹ЌНЌЌLKLЛЌ
ОKKЌЌНLЛЌЋNKLKЊЋLЛKЊLЌK316-.66-1.292-1.128-1.553-.384-.206-.935-.715-.013-.729.аШШґёАДРЂДёРаФёЬдаЂДёШдДЂДёДИаёддЂДёШШМЂИёФЬДЂДёДдШЂМёИАBг“rг“bТгsRг3ѓRУг“bгsУгCsУ"гCC‚Тг#sRУRгRУг##MKЊ
KMKЌМ€LKЊNM‹ЌЌ‹L‹ЊN
€KЊLЌЛL‹ЋMM‹KЊLKKЊЌНKKЌM-1.402.11-2.915 0 0 .922-.288 3.025 1.128.88-.248 1.815-ёМЬИЂИёЬФґёМЬЙМДёаЬёДИРЂИёЬФёМЬЙЊИёДАРґДёРМЂМёАИШґДёДИаЂ2г#bУг#‚гcRгS2г#""гcBг"г“Rгrгsrг#rгsCrKЊLЌИ‹ЋMM€
ЊЊЊ‹L‹ЌMМH
KЊMMЛMKЊNH
KЌМ‹ЊОNKЊНЛЌНИ.004.743 2.035 0 1.471-.014 2.654-.014 3.025 0 .289.206.ШМИёЬФШёФИЙДаёаФДЂДдёдаЂИИЂДФёаФРЂИИЂДЕЊАґШёАЬаґРёдИМґДУУў"f–ЖГТ"3#c#2"f–ЖВЧ'VЖSТ&WfVжцFB"угВч7fsгp
Obviously this is not a valid svg file. I hope there is a reasonable explanation for this, because otherwise it's a plugin bug.
Donate for Ukraine to help stop Russian invasion!
Ukraine's National Bank special bank account:
UA843000010000000047330992708
User avatar
Usher
Power Member
Power Member
Posts: 1675
Joined: 2011-03-11, 10:11 UTC

Re: Decode files with data URI headers

Post by *Usher »

DrShark wrote: 2019-05-12, 20:41 UTC Lets compare svg decoding by TC's Decode feature and a plugin. If to open the mht sample in Lister, it shows following:
It's unsupported by urlData plugin. It's exactly what I wrote earlier. SVG stream itself is encoded in Base64 (and saved as a single line), but in MHT it is encoded once again in Quoted-Printable (and saved as multiple lines ended with "=").
Q.E.D.
Andrzej P. Wozniak
Polish subforum moderator
Post Reply