[Implemented] Faster CRC32 method for SFV computation.

Here you can propose new features, make suggestions etc.

Moderators: Hacker, petermad, Stefan2, white

Post Reply
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

[Implemented] Faster CRC32 method for SFV computation.

Post by *lelik007 »

Christian I'm sorry for maybe being too late for this 11.50 beta but I should address this. TC uses as I think standard CRC32 implementation by Dilip V. Sarwate (maybe in Assembly) for SFV computation and it gives to 375 Mbyte/s according to this https://create.stephan-brumme.com/crc32 and my personal measurements.

So why not to speed CRC32 up in TC? You can use Slicing-by-N for example.

If I understand the things right the method with the sliced tables was brought by Intel and initially was for CRC32c(Castagnoli) but then was adopted for CRC32:
https://sourceforge.net/projects/slicing-by-8/

Slicing-by-N algos can contain literally the lookup tables in their code - table-based: Zlib, libdeflate (crc32_tables.h for example), Fast CRC32 by Stephan Brumme or tableless where the lookup tables are sliced yet calculated by the algos: 7-zip - Slicing-by-12, UnRAR- Slicing-by-16.
But every method based on Intel Slicing-by-8 gives ~ 1000 Mbyte/s.

Any Slicing-by-N method with Stephan Brumme's optimizations is available here. He's a relatively known researcher and a collector of CRC32 methods:
https://github.com/stbrumme/crc32

And as browny said below libdeflate and Zlib use their own methods based on Intel Slicing-by-8 similar to Fast CRC32 by Stephan Brumme with just the same lookup tables.

You can choose either one.

As for x32 I don't see Slicing-by-8 does any harm for RapidCRC Unicode x32 though it performs a bit slower ~ 900 Mbyte/s. In this case I think maybe even Slicing-by-4 could be a better option or just an old code. For x64 and the modern machines Slicing-by-16 can give a better performance but Slicing-by-8 is more conservative.
Last edited by lelik007 on 2024-11-21, 07:02 UTC, edited 28 times in total.
browny
Senior Member
Senior Member
Posts: 359
Joined: 2007-09-10, 13:19 UTC

Re: Faster CRC32 Method.

Post by *browny »

lelik007 wrote: 2024-10-11, 11:09 UTC TC uses as I think standard CRC32 implementation by Dilip V. Sarwate (maybe in Assembly)
Why you think so?
Probably in every decent library calculation is table-based (zlib, libdeflate).
LIBDEFLATE64.DLL exports libdeflate_crc32, and TC imports this function.
JOUBE
Power Member
Power Member
Posts: 1664
Joined: 2004-07-08, 08:58 UTC

Re: Faster CRC32 Method.

Post by *JOUBE »

Really, CRC32 improvements: who cares...
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

Re: Faster CRC32 Method.

Post by *lelik007 »

2browny
IDK what exact code is used in TC's checksum module, that was my guess. But I measured right now, Mbyte/s means exactly it: Megabytes / Time and it's about 330 Mbyte/s, if it were something table-based it'd be more like ~ 1000 Mbyte/s minimum. And, yes, you are right, libdeflate has Slicing-by-8 code but It doesn't mean it affects TC's checksum module in any way.

And this is exactly why I started this topic, especially if libdeflate has Slicing-by-8 (I didn't know it) and 7za.dll has Igor's implementation (how it's calculated for 7z archives, otherwise) why do we still have so slow method for the actual checksum module.

2JOUBE
At least this is me.
Last edited by lelik007 on 2024-10-19, 16:11 UTC, edited 5 times in total.
browny
Senior Member
Senior Member
Posts: 359
Joined: 2007-09-10, 13:19 UTC

Re: Faster CRC32 Method.

Post by *browny »

Without knowledge you could test only program performance which is in any case severely hindered by file system (most probably, synchronous disk I/O). This cannot be compared directly to method perfomance.
For the reference, average (or maybe higher than average) HDD might give 200 MB/s peak and substantially lower sustained rate for file operations.
300 MB/s suggests a very efficient implementation.
Last edited by browny on 2024-10-14, 16:36 UTC, edited 1 time in total.
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

Re: Faster CRC32 Method.

Post by *lelik007 »

2browny
I'll show you then. 2 builds of RapidCRC Unicode with the only change log:
---
v0.3.40
posted on 03/29/2024 - 16:55
Changed crc implementation to slice-by-8 from https://github.com/stbrumme/crc32
Enabled SSE4.1 for Blake2sp on x64 builds
---
I asked the developer to make these changes because up to v0.3.39 there was compact tableless Assembly code.
[img]https://www.upload.ee/thumb/17231641/CRC32comparison.jpg[/img]
And 7-zip:
[img]https://www.upload.ee/thumb/17254822/7-zipCRC32.jpg[/img]

It's just an old code in TC and maybe Christian will decide to update it at some point I'm just showing what we'll get.
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

Re: Faster CRC32 method for SFV computation.

Post by *lelik007 »

2ghisler(Author)
Thank you Christian, this is what I meant, now CRC32 has the same speed as BLAKE3 without AVX2 (if there's no this SIMD set in CPU) to ~ 1200 Mbyte/s, of course with AVX2 set BLAKE3 is ~ 1800 Mbyte/s.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50386
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: Faster CRC32 method for SFV computation.

Post by *ghisler(Author) »

Yes, it's astonishing, I had to test on my primary M.2 SSD (4 lane PCIe 4.0) because it was faster than my data SSD (SATA, 500MB/s)...
Author of Total Commander
https://www.ghisler.com
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

Re: Faster CRC32 method for SFV computation.

Post by *lelik007 »

2ghisler(Author)
Igor Pavlov actually has a big brother: Slicing-by-12, 12 Kb Tables (Assembly Code with a table generator) but I don't know how to call this function out from 7z.dll.
https://www.upload.ee/image/17254822/7-zipCRC32.jpg
But as for me, I'll be more than happy with libdeflate's Slicing-by-8 because we'd already had it before I even asked and 1100-1200 Mbytes/s is a nice speed to have.
Last edited by lelik007 on 2024-11-21, 07:06 UTC, edited 3 times in total.
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

Re: Faster CRC32 method for SFV computation.

Post by *lelik007 »

2ghisler(Author)
I think this topic can be closed as "Implemented" now.
User avatar
white
Power Member
Power Member
Posts: 5743
Joined: 2003-11-19, 08:16 UTC
Location: Netherlands

Re: Faster CRC32 method for SFV computation.

Post by *white »

lelik007 wrote: 2024-11-20, 18:21 UTC I think this topic can be closed as "Implemented" now.
By editing the subject of the first post, you can change the subject of the thread yourself.
lelik007
Member
Member
Posts: 173
Joined: 2021-04-20, 06:37 UTC

Re: [Implemented] Faster CRC32 method for SFV computation.

Post by *lelik007 »

2white
Thank you, I understood and did it.
Post Reply