[Implemented] SHA-3 Speed Improvent.
Moderators: Hacker, petermad, Stefan2, white
- ghisler(Author)
- Site Admin
- Posts: 50703
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: SHA-3 Speed Improvent.
Could you send me the results of the following Powershell commands?
1. Run powershell.exe
2. Run the following commands:
Find-Module -Name PSReadLine -Repository PSGallery | Get-Member
-> this will ask to update nuget, confirm
Install-Module -Name iPowerShellCpuid -scope CurrentUser -Force
-> this installs the commands we need.
Then please run these commands:
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name HighestBasicFunction)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name XSAVE)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX2)
On my PC, they return 27, TRUE, TRUE, TRUE.
1. Run powershell.exe
2. Run the following commands:
Find-Module -Name PSReadLine -Repository PSGallery | Get-Member
-> this will ask to update nuget, confirm
Install-Module -Name iPowerShellCpuid -scope CurrentUser -Force
-> this installs the commands we need.
Then please run these commands:
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name HighestBasicFunction)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name XSAVE)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX2)
On my PC, they return 27, TRUE, TRUE, TRUE.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: SHA-3 Speed Improvent.
2ghisler(Author)
Though iPowerShellCpuid is installed, neither one of these commands returns anything on my PC.
PowerShell hangs even Ctrl + C doesn't work.
Deleting of TCSHA64.DLL works.
Though iPowerShellCpuid is installed, neither one of these commands returns anything on my PC.
PowerShell hangs even Ctrl + C doesn't work.
PSVersion 5.1.19041.5737, Windows 10 22H2 x64 Enterprise b. 19045.5737.Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name HighestBasicFunction)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name XSAVE)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX)
Get-CpuidProperty -Property (Get-CpuidLocateProperty -Name AVX2)
Deleting of TCSHA64.DLL works.
- ghisler(Author)
- Site Admin
- Posts: 50703
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: SHA-3 Speed Improvent.
I think I found the reason for the crash, can you try this dll, please?
https://www.totalcommander.ch/beta/tcsha64.zip
https://www.totalcommander.ch/beta/tcsha64.zip
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: SHA-3 Speed Improvent.
2ghisler(Author)
I tried this new .dll and it worked, in case of i7-2600k it has no advantage because CPU lacks AVX2 support, in case of i3-10300 .dll clearly shows speed improvement.
I tried this new .dll and it worked, in case of i7-2600k it has no advantage because CPU lacks AVX2 support, in case of i3-10300 .dll clearly shows speed improvement.
Re: [Implemented] SHA-3 Speed Improvent.
2ghisler(Author)
Christian, It's definitely implemented, so thank you. And as I measured more precise I found some improvements even with i7-2600k over the internal Delphi code we had.
Is it possible to do one day something similar to what BLAKE 3 does, I mean how it uses any appropriate SIMD set CPU has?
I clearly see that Keccak Team code, unfortunately is organized the different way, but in this case you can have AVX-512 for you i7-11700 and I can have AVX for i7-2600k.
The developers claims: SSSE3, AVX, XOP, AVX2, AVX512 but how to have them at the same time to choose the right set which CPU has, that's a question.
Christian, It's definitely implemented, so thank you. And as I measured more precise I found some improvements even with i7-2600k over the internal Delphi code we had.
Is it possible to do one day something similar to what BLAKE 3 does, I mean how it uses any appropriate SIMD set CPU has?
I clearly see that Keccak Team code, unfortunately is organized the different way, but in this case you can have AVX-512 for you i7-11700 and I can have AVX for i7-2600k.
The developers claims: SSSE3, AVX, XOP, AVX2, AVX512 but how to have them at the same time to choose the right set which CPU has, that's a question.
- ghisler(Author)
- Site Admin
- Posts: 50703
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: [Implemented] SHA-3 Speed Improvent.
2lelik007
Thanks for your tests! I'm still using the same internal Delphi code, but I'm now using memory mapped access to the file, so maybe that makes it faster.
The DLL does use the Keccak team code, but I can't find any AVX code in there. It has:
KeccakP-1600-AVX2.s, that's the one I use (converted myself to masm)
KeccakP-1600-AVX512.c or KeccakP-1600-AVX512.s, unfortunately all modern Intel processors no longer have it
KeccakP-1600-opt64.c plain C implementation, no SIMD acceleration, not faster than Delphi code
KeccakP-1600-reference.c plain C reference implementation, considerably slower than Delphi code
various codes for other processors like ARM or AVR8.
So where did you see an AVX (not AVX2) implementation? As I understand it, the KeccakP-1600-times2, KeccakP-1600-times4 and KeccakP-1600-times8 implementations can't be used, they just calculate multiple hashes at the same time.
Thanks for your tests! I'm still using the same internal Delphi code, but I'm now using memory mapped access to the file, so maybe that makes it faster.
The DLL does use the Keccak team code, but I can't find any AVX code in there. It has:
KeccakP-1600-AVX2.s, that's the one I use (converted myself to masm)
KeccakP-1600-AVX512.c or KeccakP-1600-AVX512.s, unfortunately all modern Intel processors no longer have it
KeccakP-1600-opt64.c plain C implementation, no SIMD acceleration, not faster than Delphi code
KeccakP-1600-reference.c plain C reference implementation, considerably slower than Delphi code
various codes for other processors like ARM or AVR8.
So where did you see an AVX (not AVX2) implementation? As I understand it, the KeccakP-1600-times2, KeccakP-1600-times4 and KeccakP-1600-times8 implementations can't be used, they just calculate multiple hashes at the same time.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: [Implemented] SHA-3 Speed Improvent.
2ghisler(Author)
https://github.com/XKCP/XKCP?tab=readme-ov-file#how-can-i-build-the-xkcp
they referring here:
https://github.com/XKCP/XKCP/blob/6fa655a43dcc1ed1945230f3c684ac988281a995/Makefile.build#L148
In the description below how they build it:So where did you see an AVX (not AVX2) implementation?
https://github.com/XKCP/XKCP?tab=readme-ov-file#how-can-i-build-the-xkcp
they referring here:
https://github.com/XKCP/XKCP/blob/6fa655a43dcc1ed1945230f3c684ac988281a995/Makefile.build#L148
I think, no, as I understood from here It's for the parallel execution.As I understand it, the KeccakP-1600-times2, KeccakP-1600-times4 and KeccakP-1600-times8 implementations can't be used, they just calculate multiple hashes at the same time.
https://github.com/XKCP/XKCP?tab=readme-ov-file#low-level-servicesIn addition, one can find the implementation of parallelized permutations using SIMD instructions.
Re: [Implemented] SHA-3 Speed Improvent.
2ghisler(Author)
Christian, this thing does no good:
while TC 11.51 shows 410 - 420 MiB/s for CRC32, MD5; 375 MiB/s for SHA-1, and 130 MiB/s for SHA3-256. I checked 8 and 10 GiB files.
This particular SSD while it's rated as 480 MiB/s has never showed more than 460 MiB/s but usually 410-430 MiB/s.
This is not right that CRC32, MD5, SHA-1, SHA3-256 in TC 11.55 RC1 show the same low speed.
Christian, this thing does no good:
I checked CRC32, MD5, SHA-1, SHA3-256 using my SATA-III SSD Kingston KC600 512 MB (System). It's only 130-140 MiB/s with any algo,28.04.25 Added: Create/Verify Checksums: Use file mapping instead of Readfile (64)
while TC 11.51 shows 410 - 420 MiB/s for CRC32, MD5; 375 MiB/s for SHA-1, and 130 MiB/s for SHA3-256. I checked 8 and 10 GiB files.
This particular SSD while it's rated as 480 MiB/s has never showed more than 460 MiB/s but usually 410-430 MiB/s.
This is not right that CRC32, MD5, SHA-1, SHA3-256 in TC 11.55 RC1 show the same low speed.
- ghisler(Author)
- Site Admin
- Posts: 50703
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: [Implemented] SHA-3 Speed Improvent.
That makes no sense at all - did you somehow turn off the read cache of your SSD?
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: [Implemented] SHA-3 Speed Improvent.
2ghisler(Author)
I did nothing but compared the different versions of TC.
I initially thought that SHA3-256 is limited by its calculation speed, but no, CRC32, MD5, SHA-1 give the same result.
I did nothing but compared the different versions of TC.
I initially thought that SHA3-256 is limited by its calculation speed, but no, CRC32, MD5, SHA-1 give the same result.
Re: [Implemented] SHA-3 Speed Improvent.
2ghisler(Author)
This helped, thank you, now the reading itself is OK.
SHA3-256 was tested separately on a different PC where CPU has AVX2 and again with SATA-3 SSD - 370-380 MiB/s it's about the maximum, but previously it only could showed 170-190 MiB/s.
This helped, thank you, now the reading itself is OK.
CRC32, MD5 give now the result they should - 410 MiB/s (maximum), SHA-1 is 375 MiB/s (maximum).09.05.25 Fixed: Create/Verify checksums: Only use memory mapping for multi-threaded blake3 checksums (64)
SHA3-256 was tested separately on a different PC where CPU has AVX2 and again with SATA-3 SSD - 370-380 MiB/s it's about the maximum, but previously it only could showed 170-190 MiB/s.
- ghisler(Author)
- Site Admin
- Posts: 50703
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: [Implemented] SHA-3 Speed Improvent.
Thanks for your tests! I switched back to linear reading for all other algorithms than blake3, so you should get the same results as with TC 11.51 (except for SHA3 when AVX2 is supported).
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com