[Implemented] SHA-3 Speed Improvent.

Here you can propose new features, make suggestions etc.

Moderators: Hacker, petermad, Stefan2, white

User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50703
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: SHA-3 Speed Improvent.

Post by *ghisler(Author) »

Could you send me the results of the following Powershell commands?
1. Run powershell.exe
2. Run the following commands:
Find-Module -Name PSReadLine -Repository PSGallery | Get-Member
-> this will ask to update nuget, confirm
Install-Module -Name iPowerShellCpuid -scope CurrentUser -Force
-> this installs the commands we need.

Then please run these commands:
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name HighestBasicFunction)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name XSAVE)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX2)


On my PC, they return 27, TRUE, TRUE, TRUE.
Author of Total Commander
https://www.ghisler.com
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
Though iPowerShellCpuid is installed, neither one of these commands returns anything on my PC.
PowerShell hangs even Ctrl + C doesn't work.
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name HighestBasicFunction)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name XSAVE)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX2)
PSVersion 5.1.19041.5737, Windows 10 22H2 x64 Enterprise b. 19045.5737.
Deleting of TCSHA64.DLL works.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50703
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: SHA-3 Speed Improvent.

Post by *ghisler(Author) »

I think I found the reason for the crash, can you try this dll, please?
https://www.totalcommander.ch/beta/tcsha64.zip
Author of Total Commander
https://www.ghisler.com
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
I tried this new .dll and it worked, in case of i7-2600k it has no advantage because CPU lacks AVX2 support, in case of i3-10300 .dll clearly shows speed improvement.
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: [Implemented] SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
Christian, It's definitely implemented, so thank you. And as I measured more precise I found some improvements even with i7-2600k over the internal Delphi code we had.

Is it possible to do one day something similar to what BLAKE 3 does, I mean how it uses any appropriate SIMD set CPU has?
I clearly see that Keccak Team code, unfortunately is organized the different way, but in this case you can have AVX-512 for you i7-11700 and I can have AVX for i7-2600k.

The developers claims: SSSE3, AVX, XOP, AVX2, AVX512 but how to have them at the same time to choose the right set which CPU has, that's a question.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50703
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [Implemented] SHA-3 Speed Improvent.

Post by *ghisler(Author) »

2lelik007
Thanks for your tests! I'm still using the same internal Delphi code, but I'm now using memory mapped access to the file, so maybe that makes it faster.

The DLL does use the Keccak team code, but I can't find any AVX code in there. It has:
KeccakP-1600-AVX2.s, that's the one I use (converted myself to masm)
KeccakP-1600-AVX512.c or KeccakP-1600-AVX512.s, unfortunately all modern Intel processors no longer have it
KeccakP-1600-opt64.c plain C implementation, no SIMD acceleration, not faster than Delphi code
KeccakP-1600-reference.c plain C reference implementation, considerably slower than Delphi code
various codes for other processors like ARM or AVR8.

So where did you see an AVX (not AVX2) implementation? As I understand it, the KeccakP-1600-times2, KeccakP-1600-times4 and KeccakP-1600-times8 implementations can't be used, they just calculate multiple hashes at the same time.
Author of Total Commander
https://www.ghisler.com
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: [Implemented] SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
So where did you see an AVX (not AVX2) implementation?
In the description below how they build it:
https://github.com/XKCP/XKCP?tab=readme-ov-file#how-can-i-build-the-xkcp
they referring here:
https://github.com/XKCP/XKCP/blob/6fa655a43dcc1ed1945230f3c684ac988281a995/Makefile.build#L148
As I understand it, the KeccakP-1600-times2, KeccakP-1600-times4 and KeccakP-1600-times8 implementations can't be used, they just calculate multiple hashes at the same time.
I think, no, as I understood from here It's for the parallel execution.
In addition, one can find the implementation of parallelized permutations using SIMD instructions.
https://github.com/XKCP/XKCP?tab=readme-ov-file#low-level-services
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: [Implemented] SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
Christian, this thing does no good:
28.04.25 Added: Create/Verify Checksums: Use file mapping instead of Readfile (64)
I checked CRC32, MD5, SHA-1, SHA3-256 using my SATA-III SSD Kingston KC600 512 MB (System). It's only 130-140 MiB/s with any algo,
while TC 11.51 shows 410 - 420 MiB/s for CRC32, MD5; 375 MiB/s for SHA-1, and 130 MiB/s for SHA3-256. I checked 8 and 10 GiB files.

This particular SSD while it's rated as 480 MiB/s has never showed more than 460 MiB/s but usually 410-430 MiB/s.

This is not right that CRC32, MD5, SHA-1, SHA3-256 in TC 11.55 RC1 show the same low speed.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50703
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [Implemented] SHA-3 Speed Improvent.

Post by *ghisler(Author) »

That makes no sense at all - did you somehow turn off the read cache of your SSD?
Author of Total Commander
https://www.ghisler.com
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: [Implemented] SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
I did nothing but compared the different versions of TC.
I initially thought that SHA3-256 is limited by its calculation speed, but no, CRC32, MD5, SHA-1 give the same result.
lelik007
Member
Member
Posts: 190
Joined: 2021-04-20, 06:37 UTC

Re: [Implemented] SHA-3 Speed Improvent.

Post by *lelik007 »

2ghisler(Author)
This helped, thank you, now the reading itself is OK.
09.05.25 Fixed: Create/Verify checksums: Only use memory mapping for multi-threaded blake3 checksums (64)
CRC32, MD5 give now the result they should - 410 MiB/s (maximum), SHA-1 is 375 MiB/s (maximum).

SHA3-256 was tested separately on a different PC where CPU has AVX2 and again with SATA-3 SSD - 370-380 MiB/s it's about the maximum, but previously it only could showed 170-190 MiB/s.
User avatar
ghisler(Author)
Site Admin
Site Admin
Posts: 50703
Joined: 2003-02-04, 09:46 UTC
Location: Switzerland
Contact:

Re: [Implemented] SHA-3 Speed Improvent.

Post by *ghisler(Author) »

Thanks for your tests! I switched back to linear reading for all other algorithms than blake3, so you should get the same results as with TC 11.51 (except for SHA3 when AVX2 is supported).
Author of Total Commander
https://www.ghisler.com
Post Reply