faster

SIMD for Humans

Easy, powerful, absurdly fast numerical calculations. Chaining, Type punning, static dispatch (w/ inlining) based on your platform and vector types, zero-allocation iteration, vectorized loading/storing, and support for uneven collections.

It looks something like this:

let lots_of_3s = (&[-123.456f32; 128][..]).simd_iter()
    .map(|v| { f32s::splat(9.0) * v.abs().sqrt().rsqrt().ceil().sqrt() -
               f32s::splat(4.0) - f32s::splat(2.0) })
    .scalar_collect::<Vec<f32>>();

Which is analogous to this scalar code:

let lots_of_3s = (&[-123.456f32; 128][..]).iter()
    .map(|v| { 9.0 * v.abs().sqrt().sqrt().recip().ceil().sqrt() -
               4.0 - 2.0 })
    .collect::<Vec<f32>>();

The vector size is entirely determined by the machine you’re compiling for - it attempts to use the largest vector size supported by your machine, and works on any platform or architecture (see below for details).

Compare this to traditional explicit SIMD:

use std::mem::transmute;
use stdsimd::{f32x4, f32x8};

let lots_of_3s = &mut [-123.456f32; 128][..];

if cfg!(all(not(target_feature = "avx"), target_feature = "sse")) {
    for ch in init.chunks_mut(4) {
        let v = f32x4::load(ch, 0);
        let scalar_abs_mask = unsafe { transmute::<u32, f32>(0x7fffffff) };
        let abs_mask = f32x4::splat(scalar_abs_mask);
        // There isn't actually an absolute value intrinsic for floats - you
        // have to look at the IEEE 754 spec and do some bit flipping
        v = unsafe { _mm_and_ps(v, abs_mask) };
        v = unsafe { _mm_sqrt_ps(v) };
        v = unsafe { _mm_rsqrt_ps(v) };
        v = unsafe { _mm_ceil_ps(v) };
        v = unsafe { _mm_sqrt_ps(v) };
        v = unsafe { _mm_mul_ps(v, 9.0) };
        v = unsafe { _mm_sub_ps(v, 4.0) };
        v = unsafe { _mm_sub_ps(v, 2.0) };
        f32x4::store(ch, 0);
    }
} else if cfg!(all(not(target_feature = "avx512"), target_feature = "avx")) {
    for ch in init.chunks_mut(8) {
        let v = f32x8::load(ch, 0);
        let scalar_abs_mask = unsafe { transmute::<u32, f32>(0x7fffffff) };
        let abs_mask = f32x8::splat(scalar_abs_mask);
        // There isn't actually an absolute value intrinsic for floats - you
        // have to look at the IEEE 754 spec and do some bit flipping
        v = unsafe { _mm256_and_ps(v, abs_mask) };
        v = unsafe { _mm256_sqrt_ps(v) };
        v = unsafe { _mm256_rsqrt_ps(v) };
        v = unsafe { _mm256_ceil_ps(v) };
        v = unsafe { _mm256_sqrt_ps(v) };
        v = unsafe { _mm256_mul_ps(v, 9.0) };
        v = unsafe { _mm256_sub_ps(v, 4.0) };
        v = unsafe { _mm256_sub_ps(v, 2.0) };
        f32x8::store(ch, 0);
    }
}

Even with all of that boilerplate, this still only supports x86-64 machines with SSE or AVX.

Upcoming Features

Zero-overhead support for uneven collections which don’t fit entirely into a vector is upcoming. 6265 Also, zero-allocation collects are coming, to be called fill.

By 0.2.0, this code will compile:

let some_u8s = [0u8; 100];
let filled_u8s = (&[0u8; 100][..]).simd_iter()
    .uneven_map(|vector| vector * splat(2),
                |scalar| scalar * 2)
    .scalar_fill(&mut some_u8s);

More intrinsic traits are also coming; feel free to open an issue or pull request if you have one you’d like to see.

Compatibility

Faster currently supports x86 machines with SSE and above, although AVX-512 support isn’t working in rustc yet. Support for non-x86 architectures is currently blocked by stdsimd and rustc.

Of course, once those issues are resolved, adding support ARM, MIPS, or any other intrinsics and vector lengths will be trivial.

Performance

Here are some extremely unscientific benchmarks which, at least, prove that this isn’t any worse than scalar iterators.

running 4 tests
test tests::bench_nop_scalar  ... bench:          51 ns/iter (+/- 1)
test tests::bench_nop_simd    ... bench:          51 ns/iter (+/- 1)
test tests::bench_work_scalar ... bench:       1,276 ns/iter (+/- 39)
test tests::bench_work_simd   ... bench:         251 ns/iter (+/- 0)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.org		README.org

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

faster

SIMD for Humans

Upcoming Features

Compatibility

Performance

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sailfish009/faster

Folders and files

Latest commit

History

Repository files navigation

faster

SIMD for Humans

Upcoming Features

Compatibility

Performance

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages