[WCX] Increase speed of enum items from internal archive cache
Moderators: Hacker, petermad, Stefan2, white
[WCX] Increase speed of enum items from internal archive cache
To reproduce the problem, just take the ZIP archive (zip64), which contains a huge number of files (for example, 542 000 files and 61 000 directories).
Navigation into archives is very slow via TotalCmd 9.22a!!!
In the original 7zip utility (7zFM.exe), there is no such problem at all. Navigation is carried out without any delay.
The supposed reasons for the slow work:
1) Using a linear list (rather a tree list);
2) Using character-by-character string comparisons (rather a hash comparison).
Navigation into archives is very slow via TotalCmd 9.22a!!!
In the original 7zip utility (7zFM.exe), there is no such problem at all. Navigation is carried out without any delay.
The supposed reasons for the slow work:
1) Using a linear list (rather a tree list);
2) Using character-by-character string comparisons (rather a hash comparison).
Re: [WCX] Increase speed of enum items from internal archive cache
ReDisign internal archive cache
Visual explanation: Image: https://i.imgur.com/GtlEMsv.jpg
Code: Select all
#pragma pack(push, 1)
struct TArcFileInfo {
int Flags;
unsigned int PackSize;
unsigned int PackSizeHigh;
unsigned int UnpSize;
unsigned int UnpSizeHigh;
int HostOS;
int FileCRC;
int FileTime;
int UnpVer;
int Method;
int FileAttr;
char * CmtBuf;
int CmtBufSize;
int CmtSize;
int CmtState;
};
const int AdditionalDataSize = 64;
struct TArcElemInfo {
UINT64 PackSize; /* Packed file size */
UINT64 UnpSize; /* Unpacked file size */
int FileAttr;
FILETIME CreationTime; /* birthtime, Win32 format, GMT+0 */
FILETIME LastWriteTime; /* mtime, Win32 format, GMT+0 */
FILETIME LastAccessTime; /* atime, Win32 format, GMT+0 */
BYTE AdditionalData[AdditionalDataSize];
};
struct TArcCacheRec {
UINT32 RecSize; /* = sizeof(TArcCacheRec) + FullNameLen * 2 */
BYTE Flags;
TArcCacheRec * Link; /* addr of parent/child record (NULL for empty dir and normal file) */
union {
TArcFileInfo FileInfo; /* for compatibility */
TArcElemInfo ElemInfo;
};
UINT16 NameOffset; /* offset to last name into FullName */
UINT32 NameHash; /* hash for last name from FullName field */
UINT16 FullNameLen; /* length of FullName field (number of character without zero-termination) */
WCHAR FullName[1]; /* full file name with zero-termination */
};
#pragma pack(pop)
/* TArcCacheRec.Flags */
const BYTE FLAG_FILE_INFO = 0x01; /* using old file info field, see TArcFileInfo */
const BYTE FLAG_DOUBLE_DOT = 0x02; /* for parent link */
const BYTE FLAG_DIRECTORY = 0x04; /* for child link */
const BYTE FLAG_VIRTUAL = 0x08; /* for elements, which TotalCmd self created */
Code: Select all
/* get next record */
TArcCacheRec * get_next_record(TArcCacheRec * current)
{
if (current->RecSize == 0)
return NULL;
TArcCacheRec * rec = (TArcCacheRec *)((size_t)current + sizeof(TArcCacheRec) + current->RecSize);
return (rec->RecSize) ? rec : NULL;
}
/* recursive free all memory blocks */
void clear_cache(TArcCacheRec * root)
{
if (root) {
TArcCacheRec * rec = root;
while (rec = get_next_record(rec)) {
if (rec->Flags & FLAG_DIRECTORY) {
clear_cache(rec->Link);
}
}
free(root);
}
}
Last edited by remittor on 2019-11-12, 11:35 UTC, edited 1 time in total.
Re: [WCX] Increase speed of enum items from internal archive cache
If needed support renaming files:
Code: Select all
struct TArcCacheRec {
BYTE Flags;
TArcCacheRec * Link; /* addr of parent/child record (NULL for empty dir and normal file) */
union {
TArcFileInfo FileInfo; /* for compatibility */
TArcElemInfo ElemInfo;
};
UINT16 PathLen;
WCHAR * Path; /* path to elem with backslash on end (same addr for current memory block) */
UINT32 NameHash; /* hash for Name field */
UINT16 NameLen; /* length of Name field (number of character without zero-termination) */
WCHAR Name[256]; /* elem name with zero-termination */
};
const BYTE FLAG_EOL = 0x80; /* EOL */
Code: Select all
/* get next record */
TArcCacheRec * get_next_record(TArcCacheRec * current)
{
if (current->Flags & FLAG_EOL)
return NULL;
if (current[1].Flags & FLAG_EOL)
return NULL;
return ¤t[1];
}
Code: Select all
/* recursive free all memory blocks */
void clear_cache(TArcCacheRec * root)
{
if (root) {
TArcCacheRec * rec = root;
while (rec = get_next_record(rec)) {
if (rec->Flags & FLAG_DIRECTORY) {
clear_cache(rec->Link);
}
}
if (root->Path) {
free(root->Path);
}
free(root);
}
}
Re: [WCX] Increase speed of enum items from internal archive cache
Well, support++ for speeding up the virtual path navigation.
You don't need 100k+ files in an archive, even with 20k files you already have a palpable delay as soon as you navigate further, i.e. change into a sub-dir of that virtual file tree.
This is no new problem, but it gets quite annoying nowadays, especially since TC x64 isn't much faster than 32-bit TC in this matter.
My impression is, that TC uses a quite inefficient data structure to store the file/path entries for the virtual tree (and/or maybe it's the sorting algorithm behind it).
You don't need 100k+ files in an archive, even with 20k files you already have a palpable delay as soon as you navigate further, i.e. change into a sub-dir of that virtual file tree.
This is no new problem, but it gets quite annoying nowadays, especially since TC x64 isn't much faster than 32-bit TC in this matter.
My impression is, that TC uses a quite inefficient data structure to store the file/path entries for the virtual tree (and/or maybe it's the sorting algorithm behind it).
TC plugins: PCREsearch and RegXtract
Re: [WCX] Increase speed of enum items from internal archive cache
As I know, TC reads archive contents once on first entering and caches, so slow navigation within archive is a result of inefficient data structure, as milo1012 said, not an API fault. Perhaps TC really keeps archive file records in a list instead of a tree. It seems that for compatibility with linear structure a compact tree may be used that only keeps names and file record indices in a linear array.
Re: [WCX] Increase speed of enum items from internal archive cache
Nobody wrote about the error in the API.
Error in choosing the algorithm for working with data.
This forum has user requests to add the ability to rename files in archives (internal ZIP support this).MVV wrote: 2019-11-13, 06:35 UTCIt seems that for compatibility with linear structure a compact tree may be used that only keeps names and file record indices in a linear array.
Therefore, it is worth radically revising the algorithm for storing data from archives.
Re: [WCX] Increase speed of enum items from internal archive cache
The above-described file tree structure incorrect!!!
Here you can look at the correct implementation of the file tree structure: https://github.com/remittor/wcpatcher/blob/dev/src/wcp_filetree.h
A visual diagram that describes the construction of a tree: Image: https://i.imgur.com/WGKc4w9_d.jpg?maxwidth=123456789&shape=thumb&fidelity=high
Here you can look at the correct implementation of the file tree structure: https://github.com/remittor/wcpatcher/blob/dev/src/wcp_filetree.h
A visual diagram that describes the construction of a tree: Image: https://i.imgur.com/WGKc4w9_d.jpg?maxwidth=123456789&shape=thumb&fidelity=high
Last edited by remittor on 2020-01-15, 17:04 UTC, edited 4 times in total.
Re: [WCX] Increase speed of enum items from internal archive cache
You can always keep most up to date info in first post.