Add BLAKE2 to checksum methods
Moderators: Hacker, petermad, Stefan2, white
- ghisler(Author)
- Site Admin
- Posts: 50383
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: Add BLAKE2 to checksum methods
I'm using the C code from their Github. It's well optimized with support for SSE, AVX2 etc. Although there seems to be a way to create DLLs with Rust, their code doesn't support it, so I would have to learn Rust first and then modify their code to get it working. There is no guarantee that it would be faster, the disk reading speed is probably the limiting factor.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: Add BLAKE2 to checksum methods
Thank you for the answer. I know it's difficult to compile because multithreaded code is in Rust only and the official multithreaded С code doesn't exist.
But it's the fastest hash fuction for now in TC with all SIMD optimizations applied and I can't ask for more.
So there are not better BLAKE3 implementations yet.
But it's the fastest hash fuction for now in TC with all SIMD optimizations applied and I can't ask for more.
There will be a difference for a disk cache read or a RAM disk. But the thing is - I've never meet compiled Rust code in any software I've seen.There is no guarantee that it would be faster, the disk reading speed is probably the limiting factor.
So there are not better BLAKE3 implementations yet.
Re: Add BLAKE2 to checksum methods
2ghisler(Author)
Christian, BTW, BLAKE3 Team seems to have multi-threaded BLAKE3 С Code in v1.7.0:
https://github.com/BLAKE3-team/BLAKE3/releases/tag/1.7.0
Christian, BTW, BLAKE3 Team seems to have multi-threaded BLAKE3 С Code in v1.7.0:
https://github.com/BLAKE3-team/BLAKE3/releases/tag/1.7.0
During the next beta testing we could check it out.The C implementation has gained multithreading support, based on
Intel's oneTBB library. This works similarly to the Rayon-based
multithreading used in the Rust implementation. See c/README.md for
details.
- ghisler(Author)
- Site Admin
- Posts: 50383
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: Add BLAKE2 to checksum methods
(removed due to measurement error)
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: Add BLAKE2 to checksum methods
2ghisler(Author)
And new dll is the same size, but for multi-threading BLAKE3 С Code uses oneTBB library by Intel,
BLAKE3 С Code didn't become parallel just by itself, it's more like an optional dependency they included:
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md#multithreading
oneTBB should be compiled separately or somehow included.
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md#cmake
There's something weird in here. 6.5GB in 3.5s this should be even with a single thread, it's normal, 12.5s for 6.5GB is very slow.6.5GB file on a 4 lane PCIe4.0 SSD (12.5s to 3.5s).
And new dll is the same size, but for multi-threading BLAKE3 С Code uses oneTBB library by Intel,
BLAKE3 С Code didn't become parallel just by itself, it's more like an optional dependency they included:
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md#multithreading
oneTBB should be compiled separately or somehow included.
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md#cmake
But if it requires C++20 It's for the the modern systems only, as I think.BLAKE3_USE_TBB: Enable oneTBB parallelism (Requires a C++20 capable compiler)
Re: Add BLAKE2 to checksum methods
I didn't see any difference with the new blakex64 DLL.
System:
AMD Ryzen 7 9800X3D 8-Core Processor
32 GB DDR5-6400 (3200 MHz)
Samsung SSD 990 Pro 2TB PCIe Gen4 x4
Windows 11 Pro 24H2
TC 11.51 x64
Both took ca. 42 sec for a 125 GB compressed backup file.
CPU Load: ~12%
SSD Load: ~64%
125 GB/42 s = 2.98 GB/s
6.5 GB/12.5 s = 0.52 GB/s
6.5 GB/3.5 s = 1.86 GB/s
@ghisler: as lelik007 suggested, there seems to be something strange going on with your system or measurements...
Last edited by ZoSTeR on 2025-03-31, 20:10 UTC, edited 1 time in total.
Re: Add BLAKE2 to checksum methods
2ZoSTeR
There's nothing strange, what Christian gave just more modern single-threaded version, and yes,
6.5 GB/12.5 s = 0.52 GB/s is really weird, it's like the SIMDs are disabled.
Actually if you'd like you can download:
https://github.com/BLAKE3-team/BLAKE3/releases/download/1.8.0/b3sum_windows_x64_bin.exe
Which is compiled multi-threaded Rust code and measure again this 125 GB file.
b3sum_windows_x64_bin.exe file
2ghisler(Author)
Christian you have to read this carefully and understand how the devs suppose to multi-thread С code:
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md
They say DIY with the help of
https://uxlfoundation.github.io/oneTBB/
https://github.com/uxlfoundation/oneTBB
and this dll should also be provided as a runtime, I suppose.
There's nothing strange, what Christian gave just more modern single-threaded version, and yes,
6.5 GB/12.5 s = 0.52 GB/s is really weird, it's like the SIMDs are disabled.
Actually if you'd like you can download:
https://github.com/BLAKE3-team/BLAKE3/releases/download/1.8.0/b3sum_windows_x64_bin.exe
Which is compiled multi-threaded Rust code and measure again this 125 GB file.
b3sum_windows_x64_bin.exe file
2ghisler(Author)
Christian you have to read this carefully and understand how the devs suppose to multi-thread С code:
https://github.com/BLAKE3-team/BLAKE3/blob/master/c/README.md
They say DIY with the help of
https://uxlfoundation.github.io/oneTBB/
https://github.com/uxlfoundation/oneTBB
and this dll should also be provided as a runtime, I suppose.
Last edited by lelik007 on 2025-03-31, 20:49 UTC, edited 1 time in total.
Re: Add BLAKE2 to checksum methods
With the linked b3sum_windows_x64_bin.exe:
Duration: 34s
CPU Load: ~60% (all cores used with 100% spikes)
SSD Load: ~100%
125 GB/34 s = 3.68 GB/s
Besides the speed increase the exe uses up all available RAM, not sure if this a good or bad thing... the TC DLL uses barely any.
Last edited by ZoSTeR on 2025-03-31, 20:57 UTC, edited 1 time in total.
Re: Add BLAKE2 to checksum methods
2ZoSTeR
Thank you for the measurements, what can I say, BLAKE3 performs very well on your PC, but I like the results of the single-threaded variety better considering the loads, and the fact that multi-threaded variety doesn't give like + 50% boost for example.
Thank you for the measurements, what can I say, BLAKE3 performs very well on your PC, but I like the results of the single-threaded variety better considering the loads, and the fact that multi-threaded variety doesn't give like + 50% boost for example.
Re: Add BLAKE2 to checksum methods
Agreed, since this is clearly bandwidth limited and not many users will have PCIe Gen5 SSDs, it might not be worth the effort (yet).
- ghisler(Author)
- Site Admin
- Posts: 50383
- Joined: 2003-02-04, 09:46 UTC
- Location: Switzerland
- Contact:
Re: Add BLAKE2 to checksum methods
Sorry, it looks like I tested the old dll with a file on my SATA SSD, which would explain why I got 500MB/sec.
It looks like I have to map the entire file into memory (or in large blocks of, say, 1 GB) and then pass that to the hasher to use multi-threading. I don't know whether it's worth the hassle to do this.
It looks like I have to map the entire file into memory (or in large blocks of, say, 1 GB) and then pass that to the hasher to use multi-threading. I don't know whether it's worth the hassle to do this.
Author of Total Commander
https://www.ghisler.com
https://www.ghisler.com
Re: Add BLAKE2 to checksum methods
2ghisler(Author)
b3sum_windows_x64_bin.exe has the switch --no-mmap with the description:
--no-mmap
Disable memory mapping.
Currently this also disables multithreading.
Rust code isn't also multithreaded by itself, it relies on Rust Rayon, which is the specific Rust library for multithreading:
And that is similar to what OpenMP and oneTBB do, as I understand these things.
But I don't know how these 2 points relate to each other.
https://github.com/BLAKE3-team/BLAKE3/releases/download/1.8.0/b3sum_windows_x64_bin.exe
b3sum_windows_x64_bin.exe file
on your NVMe drive of course and the biggest file you have.
To understand what we'll possibly get with the multithreading.
Actually, oneTBB has Windows 10/11 as the requirement. IDK if it's right for TC.
https://github.com/uxlfoundation/oneTBB/blob/master/SYSTEM_REQUIREMENTS.md#supported-operating-systems
It looks like it. This is mmap: https://en.wikipedia.org/wiki/MmapIt looks like I have to map the entire file into memory (or in large blocks of, say, 1 GB) and then pass that to the hasher to use multi-threading.
b3sum_windows_x64_bin.exe has the switch --no-mmap with the description:
--no-mmap
Disable memory mapping.
Currently this also disables multithreading.
Rust code isn't also multithreaded by itself, it relies on Rust Rayon, which is the specific Rust library for multithreading:
https://docs.rs/rayon/latest/rayon/Rayon is a data-parallelism library that makes it easy to convert sequential computations into parallel.
And that is similar to what OpenMP and oneTBB do, as I understand these things.
But I don't know how these 2 points relate to each other.
So do I. You can check TC's x64 speed with the provided .dll v1.7.0 against the reference utility with a PC reboot after the first measurement to be precise:I don't know whether it's worth the hassle to do this.
https://github.com/BLAKE3-team/BLAKE3/releases/download/1.8.0/b3sum_windows_x64_bin.exe
b3sum_windows_x64_bin.exe file
on your NVMe drive of course and the biggest file you have.
To understand what we'll possibly get with the multithreading.
Actually, oneTBB has Windows 10/11 as the requirement. IDK if it's right for TC.
https://github.com/uxlfoundation/oneTBB/blob/master/SYSTEM_REQUIREMENTS.md#supported-operating-systems
Last edited by lelik007 on 2025-04-01, 17:18 UTC, edited 3 times in total.
Re: Add BLAKE2 to checksum methods
2ZoSTeR
If you have some time and if it's not very difficult measure, please your 125 GB Backup file again with this version:
https://github.com/BLAKE3-team/BLAKE3/releases/download/1.5.5/b3sum_windows_x64_bin.exe
but just after you turn PC on or reboot. There's might be a different result.
If you have some time and if it's not very difficult measure, please your 125 GB Backup file again with this version:
https://github.com/BLAKE3-team/BLAKE3/releases/download/1.5.5/b3sum_windows_x64_bin.exe
but just after you turn PC on or reboot. There's might be a different result.
Re: Add BLAKE2 to checksum methods
Average runtime for 10 measurements per version for a 125 GB compressed file:
b3sum v1.5.5
39.49 s
b3sum v1.8.0
38.51 s
There where fluctuations of ~2 s and not a clean lab setup
Re: Add BLAKE2 to checksum methods
2ZoSTeR
Thank you again for the testing and patience.
The loads on your system you've mentioned to do so simple task as hashing do not make me very happy:
But this is Rust code in b3sum_windows_x64_bin.exe and I'm not sure if С code with oneTBB can do better.
Thank you again for the testing and patience.
The loads on your system you've mentioned to do so simple task as hashing do not make me very happy:
It seems multi-threaded BLAKE3 place the pieces of the 125 Gb file in you RAM via mmap function to work on a piece with 8 or 16 (IDK if it detects physical or logical threads). And 60% load of CPU so powerful as yours to have so insignificant speed boost... not a good thing definitely.Besides the speed increase the exe uses up all available RAM, not sure if this a good or bad thing... the TC DLL uses barely any.
But this is Rust code in b3sum_windows_x64_bin.exe and I'm not sure if С code with oneTBB can do better.