Description
I ran all the crypto benchmarks with standard Go crypto and with BoringCrypto. Results below.
In general there is about a 200ns overhead to calling into BoringCrypto via cgo for a particular call. So for example aes.BenchmarkEncrypt (testing encryption of a single 16-byte block) went from 13ns to 209ns, or +1500%. That we can't do much about except hope that bulk operations call into cgo once instead of once per 16 bytes.
But there are also some mysteries or things to consider fixing. I've put this in milestone Go 1.10 because some of them may be bugs in the Go distribution that we should at least understand. Once we know that the problems are all on the dev.boringcrypto side, we can switch the milestone to Unreleased.
crypto/aes
- AESCFBEncrypt1K, AESCFBDecrypt1K, AESOFB1K are much slower because there is no bulk CFB operation (no cipher.cfbAble interface). Should there be? Probably.
- AESCTR1K looks like it is not using the ctrAble implementation that boring.aesCipher is offering.
- AESCBCEncrypt1K looks like it is not using the cbcEncAble implementation that boring.aesCipher is offering.
- AESCBCDecrypt1K looks like it is not using the cbcDecAble implementation that boring.aesCipher is offering.
crypto/ecdsa
- Why does SignP256 take an extra 9µs in BoringCrypto? That's too big to be cgo. Should the signature conversion be improved?
- SignP384 drops from 5.54ms to 0.85ms, indicating that the Go implementation has 6X room for improvement.
- Even after the 6X, I don't understand why P384 is so much slower than P256.
- KeyGeneration is 10X slower in BoringCrypto than in Go. We should make sure the Go version is not missing something important.
crypto/hmac
- How is it that HMACSHA256_1K takes the same amount of time as HMACSHA256_32 in BoringCrypto?
- For that matter, how it is that, in BoringCrypto, HMACSHA256_1K takes 2µs but crypto/sha256's Hash1K takes 4µs?
crypto/rsa
- Why is RSA2048Sign 3X faster in BoringCrypto?
Benchmark results (also at https://perf.golang.org/search?q=upload:20170818.4):
name old time/op new time/op delta
pkg:crypto/aes goos:linux goarch:amd64
Encrypt-4 13.3ns ± 2% 208.6ns ± 5% +1473.15% (p=0.008 n=5+5)
Decrypt-4 13.2ns ± 1% 255.0ns ± 0% +1828.90% (p=0.016 n=5+4)
Expand-4 75.8ns ± 0% 76.4ns ± 1% ~ (p=0.056 n=5+5)
pkg:crypto/cipher goos:linux goarch:amd64
AESGCMSeal1K-4 341ns ± 1% 503ns ± 0% +47.71% (p=0.008 n=5+5)
AESGCMOpen1K-4 321ns ± 0% 496ns ± 1% +54.68% (p=0.008 n=5+5)
AESGCMSeal8K-4 2.04µs ± 0% 2.21µs ± 1% +8.27% (p=0.008 n=5+5)
AESGCMOpen8K-4 1.97µs ± 1% 2.18µs ± 0% +10.84% (p=0.008 n=5+5)
AESCFBEncrypt1K-4 2.37µs ± 0% 14.48µs ± 0% +512.17% (p=0.008 n=5+5)
AESCFBDecrypt1K-4 2.27µs ± 1% 14.48µs ± 1% +538.94% (p=0.008 n=5+5)
AESOFB1K-4 1.46µs ± 1% 13.76µs ± 1% +844.07% (p=0.008 n=5+5)
AESCTR1K-4 1.66µs ± 1% 8.99µs ± 0% +442.57% (p=0.008 n=5+5)
AESCBCEncrypt1K-4 2.27µs ± 1% 8.13µs ± 0% +257.59% (p=0.008 n=5+5)
AESCBCDecrypt1K-4 1.67µs ± 2% 11.65µs ± 2% +598.98% (p=0.008 n=5+5)
pkg:crypto/des goos:linux goarch:amd64
Encrypt-4 162ns ± 3% 159ns ± 1% ~ (p=0.222 n=5+5)
Decrypt-4 157ns ± 1% 158ns ± 2% ~ (p=0.722 n=5+5)
TDESEncrypt-4 380ns ± 0% 381ns ± 1% ~ (p=0.857 n=5+5)
TDESDecrypt-4 386ns ± 0% 386ns ± 0% ~ (p=1.000 n=5+5)
pkg:crypto/ecdsa goos:linux goarch:amd64
SignP256-4 36.2µs ± 2% 45.3µs ± 1% +24.91% (p=0.008 n=5+5)
SignP384-4 5.54ms ± 0% 0.85ms ± 1% -84.63% (p=0.008 n=5+5)
VerifyP256-4 104µs ± 1% 102µs ± 0% -1.29% (p=0.016 n=5+4)
KeyGeneration-4 21.9µs ± 1% 200.3µs ± 0% +815.39% (p=0.008 n=5+5)
pkg:crypto/elliptic goos:linux goarch:amd64
BaseMult-4 979µs ± 3% 954µs ± 1% -2.62% (p=0.008 n=5+5)
BaseMultP256-4 19.8µs ± 0% 19.7µs ± 1% ~ (p=0.151 n=5+5)
ScalarMultP256-4 77.3µs ± 0% 76.7µs ± 0% -0.70% (p=0.008 n=5+5)
pkg:crypto/hmac goos:linux goarch:amd64
HMACSHA256_1K-4 4.46µs ± 0% 1.92µs ± 0% -56.91% (p=0.008 n=5+5)
HMACSHA256_32-4 1.16µs ± 0% 1.94µs ± 2% +66.17% (p=0.008 n=5+5)
pkg:crypto/md5 goos:linux goarch:amd64
Hash8Bytes-4 184ns ± 0% 183ns ± 0% -0.54% (p=0.029 n=4+4)
Hash1K-4 2.01µs ± 0% 2.01µs ± 1% ~ (p=0.087 n=5+5)
Hash8K-4 14.8µs ± 1% 14.8µs ± 0% ~ (p=0.651 n=5+5)
Hash8BytesUnaligned-4 184ns ± 0% 183ns ± 0% -0.89% (p=0.000 n=5+4)
Hash1KUnaligned-4 2.01µs ± 0% 2.00µs ± 0% -0.41% (p=0.040 n=5+5)
Hash8KUnaligned-4 14.9µs ± 1% 15.2µs ± 3% ~ (p=0.690 n=5+5)
pkg:crypto/rand goos:linux goarch:amd64
Prime-4 146ms ±57% 143ms ± 8% ~ (p=0.548 n=5+5)
pkg:crypto/rc4 goos:linux goarch:amd64
RC4_128-4 313ns ± 3% 287ns ± 2% -8.19% (p=0.008 n=5+5)
RC4_1K-4 2.72µs ± 2% 2.70µs ± 1% ~ (p=0.151 n=5+5)
RC4_8K-4 21.8µs ± 2% 22.0µs ± 2% ~ (p=0.056 n=5+5)
pkg:crypto/rsa goos:linux goarch:amd64
RSA2048Sign-4 2.90ms ± 1% 1.08ms ± 1% -62.64% (p=0.008 n=5+5)
pkg:crypto/sha1 goos:linux goarch:amd64
Hash8Bytes-4 215ns ± 0% 562ns ± 1% +161.58% (p=0.016 n=4+5)
Hash320Bytes-4 821ns ± 1% 1096ns ± 0% +33.45% (p=0.008 n=5+5)
Hash1K-4 1.64µs ± 0% 2.26µs ± 0% +37.21% (p=0.008 n=5+5)
Hash8K-4 10.6µs ± 0% 12.5µs ± 0% +18.04% (p=0.008 n=5+5)
pkg:crypto/sha256 goos:linux goarch:amd64
Hash8Bytes-4 310ns ± 1% 679ns ± 1% +118.96% (p=0.008 n=5+5)
Hash1K-4 3.61µs ± 0% 3.99µs ± 0% +10.50% (p=0.008 n=5+5)
Hash8K-4 26.8µs ± 3% 26.6µs ± 1% ~ (p=0.548 n=5+5)
pkg:crypto/sha512 goos:linux goarch:amd64
Hash8Bytes-4 419ns ± 1% 805ns ± 1% +91.94% (p=0.008 n=5+5)
Hash1K-4 2.67µs ± 1% 3.13µs ± 1% +17.25% (p=0.008 n=5+5)
Hash8K-4 18.0µs ± 0% 18.8µs ± 1% +4.07% (p=0.008 n=5+5)
pkg:crypto/tls goos:linux goarch:amd64
Throughput/MaxPacket/1MB-4 4.02ms ± 1% 3.48ms ± 1% -13.48% (p=0.008 n=5+5)
Throughput/MaxPacket/2MB-4 6.11ms ± 2% 5.66ms ± 1% -7.46% (p=0.008 n=5+5)
Throughput/MaxPacket/4MB-4 10.3ms ± 1% 10.0ms ± 1% -3.65% (p=0.008 n=5+5)
Throughput/MaxPacket/8MB-4 18.6ms ± 1% 18.5ms ± 0% ~ (p=0.151 n=5+5)
Throughput/MaxPacket/16MB-4 35.1ms ± 1% 35.5ms ± 1% ~ (p=0.222 n=5+5)
Throughput/MaxPacket/32MB-4 68.0ms ± 1% 69.9ms ± 2% +2.67% (p=0.008 n=5+5)
Throughput/MaxPacket/64MB-4 133ms ± 1% 137ms ± 0% +2.90% (p=0.008 n=5+5)
Throughput/DynamicPacket/1MB-4 4.11ms ± 1% 3.55ms ± 2% -13.55% (p=0.008 n=5+5)
Throughput/DynamicPacket/2MB-4 6.32ms ± 4% 5.70ms ± 2% -9.80% (p=0.008 n=5+5)
Throughput/DynamicPacket/4MB-4 10.5ms ± 1% 10.1ms ± 1% -3.51% (p=0.008 n=5+5)
Throughput/DynamicPacket/8MB-4 18.7ms ± 1% 18.6ms ± 0% ~ (p=0.222 n=5+5)
Throughput/DynamicPacket/16MB-4 35.3ms ± 1% 35.7ms ± 1% +1.18% (p=0.032 n=5+5)
Throughput/DynamicPacket/32MB-4 67.9ms ± 0% 69.6ms ± 1% +2.44% (p=0.008 n=5+5)
Throughput/DynamicPacket/64MB-4 134ms ± 0% 137ms ± 1% +2.21% (p=0.016 n=4+5)
Latency/MaxPacket/200kbps-4 699ms ± 1% 697ms ± 0% ~ (p=0.151 n=5+5)
Latency/MaxPacket/500kbps-4 286ms ± 0% 283ms ± 0% -0.84% (p=0.008 n=5+5)
Latency/MaxPacket/1000kbps-4 147ms ± 0% 145ms ± 0% -1.62% (p=0.008 n=5+5)
Latency/MaxPacket/2000kbps-4 77.9ms ± 1% 74.6ms ± 2% -4.22% (p=0.008 n=5+5)
Latency/MaxPacket/5000kbps-4 35.5ms ± 0% 33.2ms ± 4% -6.46% (p=0.008 n=5+5)
Latency/DynamicPacket/200kbps-4 139ms ± 3% 138ms ± 0% ~ (p=0.151 n=5+5)
Latency/DynamicPacket/500kbps-4 60.7ms ± 1% 58.9ms ± 1% -2.97% (p=0.008 n=5+5)
Latency/DynamicPacket/1000kbps-4 34.0ms ± 2% 32.1ms ± 1% -5.51% (p=0.008 n=5+5)
Latency/DynamicPacket/2000kbps-4 20.0ms ± 1% 18.1ms ± 3% -9.52% (p=0.008 n=5+5)
Latency/DynamicPacket/5000kbps-4 10.2ms ± 4% 10.1ms ± 8% ~ (p=1.000 n=5+5)