[WCX] ZPAQ

Discuss and announce Total Commander plugins, addons and other useful tools here, both their usage and their development.

Moderators: white, Hacker, petermad, Stefan2

Post Reply
krasusczak
Senior Member
Senior Member
Posts: 282
Joined: 2011-09-23, 10:35 UTC

Post by *krasusczak »

Hi, one question is there any chance to add function like "Don't show empty folders"?
When I archive many times one file but only a few version are different, I have a lot of empty folders & only a few with changes.
Function to skip empty dir when listing will be great:)
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

krasusczak wrote:Function to skip empty dir when listing will be great:)
Sure, I could add such function.
I'm not sure if it's easy to implement though, because I'd had to parse and cross-check all archive entries first,
which could delay archive opening significantly.

FYI:
A new directory entry will be created because the directory timestamp changed.
On NTFS this happens when you created new files in that dir or deleted old ones,
but not if you just modified existing files.
TC plugins: PCREsearch and RegXtract
krasusczak
Senior Member
Senior Member
Posts: 282
Joined: 2011-09-23, 10:35 UTC

Post by *krasusczak »

You can't do this in the plug-in (something like filtering) when it read the archive? something like if the dir empty than hide/don't show this entry?

I don't have idea about coding so I didn't think that this will be so problematic, I thought that this will be just like different version of "show all archive versions", you get all archive or the newest so I thought that it's easy to just select what you want to show when you list such archive:/
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

It will be possible, like I said.

But no, some easy filtering won't work.
I have to check every dir entry if it has some files in it, before reporting it to TC,
because TC just asks for archive entries one by one, so I can't manipulate that list once it's loaded.

I'll see what I can do for the next version.
TC plugins: PCREsearch and RegXtract
krasusczak
Senior Member
Senior Member
Posts: 282
Joined: 2011-09-23, 10:35 UTC

Post by *krasusczak »

Ok, thanks I didn't know :)
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

New Version 1.1a!
  • added 'Mask empty archive versions' option
  • fixed: resource leak (open directory handles) when using multi-part detection
Check the first post for the new file.
TC plugins: PCREsearch and RegXtract
krasusczak
Senior Member
Senior Member
Posts: 282
Joined: 2011-09-23, 10:35 UTC

Post by *krasusczak »

Wow, that was really fast, thanks works like a charm :)
User avatar
MaxX
Power Member
Power Member
Posts: 1029
Joined: 2012-03-23, 18:15 UTC
Location: UA

Post by *MaxX »

What is ZPAQ format (WinRAR, 7Zip, or ARC based)?

How can I use any other standalone archiver (out of TC) to read/write ZPAQ files used by the plugin?
User avatar
Horst.Epp
Power Member
Power Member
Posts: 6481
Joined: 2003-02-06, 17:36 UTC
Location: Germany

Post by *Horst.Epp »

MaxX wrote:What is ZPAQ format (WinRAR, 7Zip, or ARC based)?

How can I use any other standalone archiver (out of TC) to read/write ZPAQ files used by the plugin?
Why not reading the first posting in this thread.
Then you would no have asked any of this 2 questions :D
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker »

Not related to the plugin itself. Not using the plugin ATM.
Comparing ZPAQ to WinRAR, packing 15 latest versions of *.sqlite from Firefox's profile folder. WinRAR with maximum compression creates a ~28 MB archive for the 15 versions, ZPAQ with -method 511 and -fragment 1 is already larger after five versions. Am I doing something wrong?
EDIT: It seems ZPAQ does not use the previously packed versions to deduplicate the data?
EDIT 2: So, after 15 versions finished packing it's ZPAQ 70 MB vs. WinRAR 28 MB.

TIA
Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Hacker wrote:Comparing ZPAQ to WinRAR, packing 15 latest versions of *.sqlite from Firefox's profile folder. WinRAR with maximum compression creates a ~28 MB archive for the 15 versions, ZPAQ with -method 511 and -fragment 1 is already larger after five versions. Am I doing something wrong?
...
Well, the main features of the format are the journaling, deduplication and the (very) efficient compression modes (3/4/5) for single files,
but in contrast to Rar and 7-Zip Zpaq lacks solid compression.

This means that you easily can get worse compression ratios, if your files aren't much bigger than the solid block size.
Also, I think you shouldn't really tune the fragment size, and especially not to the lowest value of 1, which equals 2 kB, compared to the default 64 kB.
I didn't implement the size option in the plug-in, for the reason that there's often no gain in changing this.

I'm not sure in what portions such sqlite files differ in each version, but if it's a database it is quite likely that the dedup feature can't kick in,
since the index offsets change all over the place, even though only a few records are added/changed.
The lack of solid compression explains the rest: every version is compressed on it's own.

All I can say is that some large image files compress about twice as good as 7-Zip, even with a 256 MB dictionary size (x64 mode),
because the image files differ only in a few places and are much larger than the dict. size.
But of course, every file scenario compresses different.
For me the main feature is journaling, and not maximum compression,
not to mention that mode 4 and 5 work symmetric (decompression takes roughly the same time as compression).


Edit:
Did you use the same fragment size from the beginning?
I ask because:
Zpaq manpage wrote:Values other than 6 conform to the ZPAQ specification and will decompress correctly by all versions, but do not conform to the recommendation for best deduplication. Adding identical files with different values of N will not deduplicate because the fragment boundaries will differ.
Another reason to not implement the setting for the plug-in...
TC plugins: PCREsearch and RegXtract
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker »

milo1012,
Thank you very much for the technical explanation. I have performed two tests, one without specifying the fragment parameter at all for any step, the other one with specifying fragment 1 for all steps, so it's probably because of the lack of solid compression. Thanks again.

EDIT: I have now tried to add 15 versions of one file, an IM database of ~37 MB. WinRAR - 12 MB, ZPAQ method 511 - 36 MB.
I currently do not see any advantage to WinRAR's "Keep previous file versions" option. ZPAQ is slower and has worse compression if used for more than 5 file versions.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

Hacker wrote:I have now tried to add 15 versions of one file, an IM database of ~37 MB. WinRAR - 12 MB, ZPAQ method 511 - 36 MB.
A DB file again? I'd say it's the same scenario as before.
And the compression mode 5 is NOT recommended for general purpose. (see http://mattmahoney.net/dc/10gb.png)
Try some different scenario, e.g. try to backup the whole Windows directory with method 2,
and compare speed and compression ratio with Rar/Zip/7-Zip.
In my tests ZPAQ won every time.
Hacker wrote:I currently do not see any advantage to WinRAR's "Keep previous file versions" option. ZPAQ is slower and has worse compression if used for more than 5 file versions.
Of course it's a matter of personal preference.
Like I said: don't use method 4/5 for "every day compression".

The rar method shows all file versions at once, with a numbering scheme, which hinders restoring old file versions IMO (manual renaming required).
But sure, I also prefer Rar most times, but more because of that great error recovery capability (Recovery Records).
TC plugins: PCREsearch and RegXtract
User avatar
Hacker
Moderator
Moderator
Posts: 13064
Joined: 2003-02-06, 14:56 UTC
Location: Bratislava, Slovakia

Post by *Hacker »

milo1012,
Try some different scenario, e.g. try to backup the whole Windows directory with method 2, and compare speed and compression ratio with Rar/Zip/7-Zip.
Are we still taking journalling / multiple file versions into account? Because I am only testing ZPAQ based on the assumption that it excels at this feature. There are faster archivers, there are archivers with a higher compression ratio, so I hoped ZPAQ would exceed WinRAR at journalling, so far it has neither in speed nor in compression ratio.
I am packing my C:\Windows\*.* dir right now as suggested, using default ZPAQ options, let's see how it goes.

Roman
Mal angenommen, du drückst Strg+F, wählst die FTP-Verbindung (mit gespeichertem Passwort), klickst aber nicht auf Verbinden, sondern fällst tot um.
User avatar
milo1012
Power Member
Power Member
Posts: 1158
Joined: 2012-02-02, 19:23 UTC

Post by *milo1012 »

2Hacker
Look, I'm neither the Zpaq author nor do I intend to defend the format at all cost.
But I understand how it works technically, plus I know and understood the source code.

Zpaq was intended for incrementing backup purposes of multi-file collections, and it works really good for such scenarios.
So when I said that DB files are a bad example, I meant it exactly like that, because most db formats reorder all entries,
or at least the index positions, as soon as you manipulate one entry, and therefore Zpaq can't deduplicate the data any more,
even with a tiny fragment size.
So if your personal scenarios consist of mostly db files, you better stick with Rar et. al.,
but for most other ones Zpaq works really good.
Hacker wrote:Are we still taking journalling / multiple file versions into account?
Partly, because at least a decent amount of files in the Win dir consist of different versions of the same base file (e.g. SxS installations).
So no, not multiple versions as in "multiple archive updates", but multiple identical file fragments.
In the end it's the same, because when adding any new version of a file to the archive it is deduplicated anyway,
and so it's a benchmark for how good the dedupe feature works in general.
Hacker wrote:so I hoped ZPAQ would exceed WinRAR at journalling
Since when is Rar a journaling archiver?
It can keep old file versions by renaming them, nothing more, but it's not optimized for the journaling process at all,
because as soon as you turn off solid compression or have multi-part archives that you don't want to recompress,
the different versions are stored all on their own again; also no timestamps are kept for when you added the versions, and so on...
TC plugins: PCREsearch and RegXtract
Post Reply