Wormhole

The Wormhole index structure was introduced in paper "Wormhole: A Fast Ordered Index for In-memory Data Management" by Xingbo Wu, Fan Ni, and Song Jiang (ACM DL).

This repository maintains a reference implementation of the Wormhole index structure on x86_64 Linux/FreeBSD with SSE 4.2. The implementation has been well tuned on Xeon E5-26xx v4 CPUs with some aggressive optimizations.

Experimental ARM64(AArch64) support has been added. The code has not been optimized for ARM64.

NEWS

The whsafe API is a worry-free thread-safe wormhole API. At a small cost on each operation, users no longer need to call the wormhole_refresh_qstate in any circumstances.
merge (Merge a new kv with existing kv) and delr (delete range) operations have been added. They are all thread-safe.

Highlights:

Thread-safety: all operations, including get, set, inplace-update (inp), del, iter-seek, iter-peek, iter-skip etc., are thread-safe. See stresstest.c for more thread-safe operations.
Keys can contain any value, including binary zeros ('\0'). Their sizes are always explicitly specified in struct kv.
Long keys are welcome! The key-length field (klen in struct kv) is a 32-bit unsigned integer and the maximum size of a key is 4294967295.
No background threads or global status. Wormhole uses a mix of user-space rwlocks and QSBR RCU to synchronize between readers and writers. See below for more details.

Build

x86_64

Wormhole supports 64-bit x86_64 CPUs running Linux, FreeBSD, or Darwin (Mac OS X). Clang is the default compiler. It can be changed to gcc in Makefile ($ make CCC=gcc). On our testbed, Clang usually produces faster code than GCC.

To build:

$ make

Alternatively, you may use O=0g to enable debug info and disable optimizations:

$ make O=0g

Read Makefile.common for options on optimization and debugging levels (O=).

AArch64

Wormhole supports 64-bit ARM CPUs (AArch64). The currect implementation requires NEON SIMD and the crc features on the target CPU. The code has been tested on a Raspberry PI 4 running 64-bit ArchlinuxArm, and a Jetson Nano running Ubuntu Groovy.

Sample programs

To run the demo code:

$ ./demo1.out <a text file>

Each line in the text file becomes a key. Duplicates are allowed. You may use "wh.c" for a quick test drive.

concbench.out is an example benchmarking tool of only 150 LoC. See the helper messages for more details. It generates six-word keys based on a word list (words.txt). See sprintf in concbench.c.

$ wget https://github.com/dwyl/english-words/raw/master/words.txt
$ ./concbench.out words.txt 10000000 4
$ numactl -N 0 ./concbench.out words.txt 10000000 4

stresstest.out tests all thread-safe operations.

libwh.so can be linked to any C/C++ program with the help of wh.h.

The code

`struct kv` and `struct kref`

Please refer to demo1.c for quick examples of how to manipulate the key-value (struct kv) and the key-reference (struct kref). There are a handful of helper functions (kv_* and kref_* functions) at the beginning of wh.h.

It's worth noting that the key's hash (hash of struct kv and hash32 of struct kref) must be up-to-date before passed to wormhole. The kv_refill* helper functions internally update the hash after filling the kv contents. In a more general case, kv_update_hash directly updates a struct kv's hash. Similarly, kref_refill_hash32() calculates the 32-bit hash for struct kref.

The Wormhole API

The Wormhole functions are listed near the bottom of wh.h (see the wormhole_* functions). demo1.c and concbench.c are examples of how to use a Wormhole index. There are three sets of Wormhole API: whsafe, wormhole, and whunsafe.

whsafe: The worry-free thread-safe API. If you use Wormhole in a concurrent environment and want minimal complexity in your code, you should use whsafe.
wormhole: The standard thread-safe API. It offers better efficiency than whsafe but requires some extra effort for deadlock prevention.
whunsafe: the thread-unsafe API. It offers the best efficiency but does not perform internal concurrency control. External synchronization should be employed when running whunsafe in a concurrent environment.

The functions of each API can be found near the end of wh.c (search kvmap_api_whsafe, kvmap_api_wormhole, and kvmap_api_whunsafe). Note that each API contains a mix of whsafe_*, wormhole_*, and whunsafe_* functions.

The `whsafe` API

The whsafe API functions are listed in the kvmap_api_whsafe structure in wh.c. The API consists of a mix of wormhole_* and whsafe_* functions.

The index operations (GET, SET, UPDATE, DEL, PROBE, INPLACE, MERGE, and SCAN (wormhole_iter_* functions)) are all thread safe. A thread needs to hold a reference of the index (wormref) to perform safe index operations.

An example of using point-query operations using the whsafe API.

index = wormhole_create(NULL); // use NULL here unless you want to change the allocator.
ref = whsafe_ref(index);
for (...) {
  whsafe_set(ref, ...);
  whsafe_get(ref, ...);
  whsafe_del(ref, ...);
  ... // other safe operations
}
... // other safe operations
wormhole_unref(ref);
wormhole_destroy(index);

An example of range-query operations:

ref = whsafe_ref(index);
// ... assume we already have a valid ref
iter = wormhole_iter_create(ref);
for (...) {
  whsafe_iter_seek(iter, key);
  wormhole_iter_peek(iter, buf);
  wormhole_iter_skip(iter, 1);
  wormhole_iter_peek(iter, buf);
  wormhole_iter_skip(iter, 3);
  wormhole_iter_inp(iter, uf, priv);
  // other iter operations
}
// An active iterator is likely holding a lock.
whsafe_iter_park(iter); // Release resources to avoid blocking other threads
// it's now safe to do something such as sleep() or waitpid()
// ... start using the iterator again
whsafe_iter_seek(iter, key2);
// ... other iter operations
whsafe_iter_destroy(iter);
// ... do something
// must destroy iterators before unref()
wormhole_unref(ref);

The `wormhole` API

Similar to whsafe, wormhole is also thread safe. It's often faster than whsafe but requires extra caution when using it.

An example of using point-query operations using the wormhole API.

index = wormhole_create(NULL); // use NULL here unless you want to change the allocator.
ref = wormhole_ref(index);
for (...) {
  wormhole_set(ref, ...);
  wormhole_get(ref, ...);
  wormhole_del(ref, ...);
  ... // other safe operations
}
... // other safe operations
wormhole_unref(ref);
wormhole_destroy(index);

An example of range-query operations:

ref = wormhole_ref(index);
// ... assume we already have a valid ref
iter = wormhole_iter_create(ref);
for (...) {
  wormhole_iter_seek(iter, key);
  wormhole_iter_peek(iter, buf);
  wormhole_iter_skip(iter, 1);
  wormhole_iter_peek(iter, buf);
  wormhole_iter_skip(iter, 3);
  wormhole_iter_inp(iter, uf, priv);
  // other iter operations
}
// An active iterator is likely holding a lock.
wormhole_iter_park(iter); // Release resources to avoid blocking other threads
while (condition not met) { // See below for explanation
    wormhole_refresh_qstate(ref);
}
// ... start using the iterator again
wormhole_iter_seek(iter, key2);
// ... other iter operations
wormhole_iter_destroy(iter);
// ... do something
// must destroy iterators before unref()
wormhole_unref(ref);

Avoid blocking writers when using the `wormhole` API

Wormhole internally uses QSBR RCU to synchronize readers/writers so every holder of a reference (ref) needs to actively perform index operations. An ref-holder, if not actively performing index operations, may block a writer thread that is performing split/merge operations. (because of not periodically announcing its quiescent state). If a ref-holder is about to become inactive from Wormhole's perspective (doing something else or just sleeping), it is recommended that the holder temporarily releases the ref before entering the inactive status (such as calling sleep(10)), and obtains a new ref before performing the next index operation.

// holding a ref
wormhole_unref(ref);  // warning: this can slowdown your program
sleep(10);
ref = wormhole_ref(map);  // warning: this can slowdown your program
// perform index operations with (the new) ref

However, frequently calling wormhole_ref() and wormhole_unref() can be expensive because they acquire locks internally. A better solution can be used if the ref-holder thread can periodically update its quiescent state by calling wormhole_refresh_qstate(). This method has negligible cost (only two instructions). For example:

// holding a ref
while (wait_for_client_with_timeout_10us(...)) {
  wormhole_refresh_qstate(ref);  // only two mov instructions on x86_64
}
// perform index operations with ref

A common scenario of dead-locking is when acquiring locks while holding a wormhole reference, since lock() functions will block-wait and the calling thread won't be able to update the quiescent state. It is recommanded to always use trylock() in a loop and keep updating the qstate:

while (pthread_mutex_trylock(&some_lock) == EBUSY) {
    wormhole_refresh_qstate(ref);
}

The above issues with QSBR are all gone with whsafe.

The `whunsafe` API

A set of thread-unsafe functions are also provided. See the functions with prefix whunsafe. The thread-unsafe functions don't use the reference (wormref). Simply feed them with the pointer to the wormhole index:

index = whunsafe_create(NULL);
for (...) {
  whunsafe_set(index, ...);
  whunsafe_get(index, ...);
  whunsafe_del(index, ...);
  ... // other unsafe operations
}
... // other unsafe operations
wormhole_destroy(index);

In-place update with user-defined function

wormhole_inp executes a user-defined function on an existing key-value item. If the key does not exist, a NULL pointer will be passed to the user-defined function. A simple example would be incrementing a counter stored in a key-value pair.

// user-defined in-place update function
void myadd1(struct kv * kv, void * priv) {
  if (kv != NULL) {
    assert(kv->vlen >= sizeof(u64));
    u64 * pvalue = kv_vptr(kv);
    (*pvalue)++;
  }
}

// create the counter
u64 zero = 0;
struct kv * tmp = kv_create("counter", 7, &zero, 8); // malloc-ed
wormhole_set(ref, tmp);

// perform +1 on the stored value
struct kref kref = kv_ref(tmp); // create a kref of tmp
wormhole_inp(ref, &kref, myadd1, NULL);

Note that the user-defined function should ONLY change the value's content, and nothing else. Otherwise, the index can be corrupted. A similar mechanism is also provided for iterators (wormhole_iter_inp).

The inplace function can also be used to retrieve key-value data. For example:

void inplace_getu64(struct kv * kv, void * priv) {
  if (kv != NULL) {
    assert(kv->vlen >= sizeof(u64));
    u64 * pvalue = kv_vptr(kv);
    *(u64 *)priv = *pvalue;
  } else {
    *(u64 *)priv = 0;
  }
}
...
struct kref kref = ...
u64 val;
wormhole_inp(ref, &kref, inplace_getu64, &val);

`merge`: atomic Read-Modify-Write

The wormhole_merge and whsafe_merge functions perform atomic Read-Modify-Write (RMW) operations. In a RMW operation, if the search key is found, the KV pair will be passed to a user-defined callback function uf (short for user function). Otherwise, a NULL pointer is passed to uf. uf could update the KV in-place if it does not require any memory reallocation. In such a case, uf should return the KV's pointer back and the merge function will do nothing else. If uf want to replace the KV with something new, it should return a pointer that is different than the original KV pointer. The uf should not make memory allocation by itself. Instead, the merge function will copy the returned KV and replace the existing KV with the newly created one. uf should not return NULL unless the key was not found.

Iterator

The wormhole_iter_{seek,peek,skip,next,inp} functions provide range-search functionalities. If the search key does not exist, the seek operation will put the cursor on the item that is greater than the search-key. next will return the item under the current cursor and move the cursor forward. peek is similar but does not move the cursor. For example, with keys {1,3,5}, seek(2); r = next() will see r == 3.

Currently Wormhole does not provide seek_for_less_equal() and prev() for backward scanning. This feature will be added in the future.

Memory management

By default, Wormhole manages all the key-value data internally and only copies to or from a user-supplied buffer (a struct kv object). This draws a clear boundary in the memory space between the index structure and its users. After a call to any of the index operations, the caller can immediately free the buffer holding the key-reference or the key-value data. This also allows users to use stack-allocated variables to interact with Wormhole.

The memory manager of the internal key-value objects can be customized when creating a new Wormhole (see wormhole_create). The customization will only affect the internal struct kv objects. Actually, the memory manager can be configured to directly use the caller's struct kv object and store it in Wormhole. This struct kvmap_mm structure shows an example:

const struct kvmap_mm kvmap_mm_ualloc {
  .in = kvmap_mm_in_noop, // in wormhole_set(), store caller's kv in wh
  .out = kvmap_mm_out_dup, // but still make a copy in wormhole_get()
  .free = kvmap_mm_free_free, // call free() for delete/update
};
...
struct wormhole * wh = wormhole_create(&kvmap_mm_ualloc);
struct wormref * ref = wormhole_ref(wh);
...
struct kv * newkv = malloc(size);
...
wormhole_set(ref, newkv);
// Don't free newkv! it's now managed by wh

Each of the in/out/free functions can be freely customized. A few kvmap_mm_* functions are already provided for common scenarios.

Hugepages

Wormhole uses hugepages when available. To reserve some hugepages in Linux (10000 * 2MB):

# echo 10000 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Tuning

A few macros in wh.c can be tuned.

WH_SLABLEAF_SIZE controls the slab size for leaf node allocation. The default is ((1lu << 21)) (2MB slabs). If 1GB hugepages are available, WH_SLABLEAF_SIZE can be set to ((1lu << 30)) to utilize those 1GB hugepages.
WH_KPN controls "Keys Per (leaf-)Node". The default value is 128. Setting it to 256 can increase search speed by roughly 10% but slows down internal split/merge operations (but not every insertion/deletion).
QSBR_STATES_NR and QSBR_SHARDS_NR control the capacity (number of references) of the QSBR RCU. The product of the two values is the capacity. For efficiency, QSBR_STATES_NR can be set to 22, 38, and 54, and QSBR_SHARDS_NR must be 2^n. The defaults are set to 38 and 8, respectively. This QSBR implementation uses sharding so wormhole_ref() will block (busy-waiting) if the target shard is full.

Limitations

Key Patterns

The Wormhole index works well with real-world keys. A split operation may fail with one of the following (almost impossible) conditions:

The maximum anchor-key length is 65535 bytes (represented by a 16-bit value), which is shorter than the maximum key-length (32-bit). Split will fail if all cut-points in the target leaf node require longer anchor-keys. In such case, at least 129 (WH_KPN + 1) keys must share a common prefix of 65535+ bytes.
Two anchor-keys cannot be identical after removing their trailing zeros. To be specific, "W" and "Worm" can be anchor-keys at the same time, but "W" and "W\0\0" cannot (while these two keys can co-exist as regular keys). If there are at least 129 (WH_KPN + 1) keys shareing the same prefix but having ONLY different numbers of trail zeros (having "W", "W\0", "W\0\0", "W\0\0\0" ... and finally a 'W' with at least 128 trailing zeros), the split will fail.

Memory Allocation

Insertions can fail when there is not enough memory. The current implementation can safely return after any failed memory allocation, except for hash-table expansion (resizing). On memory-allocation failure, the expansion function will block and wait for available memory to proceed. In the future, this behavior will be changed to returning with an insertion failure.

Performance

Some benchmarking results with some real-world datasets: See this page for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
Makefile.common		Makefile.common
README.md		README.md
concbench.c		concbench.c
ctypes.h		ctypes.h
demo1.c		demo1.c
kv.c		kv.c
kv.h		kv.h
lib.c		lib.c
lib.h		lib.h
stresstest.c		stresstest.c
wh.c		wh.c
wh.h		wh.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wormhole

NEWS

Highlights:

Build

x86_64

AArch64

Sample programs

The code

`struct kv` and `struct kref`

The Wormhole API

The `whsafe` API

The `wormhole` API

Avoid blocking writers when using the `wormhole` API

The `whunsafe` API

In-place update with user-defined function

`merge`: atomic Read-Modify-Write

Iterator

Memory management

Hugepages

Tuning

Limitations

Key Patterns

Memory Allocation

Performance

About

Uh oh!

Releases

Packages

Languages

License

Shirosakirukia/wormhole

Folders and files

Latest commit

History

Repository files navigation

Wormhole

NEWS

Highlights:

Build

x86_64

AArch64

Sample programs

The code

struct kv and struct kref

The Wormhole API

The whsafe API

The wormhole API

Avoid blocking writers when using the wormhole API

The whunsafe API

In-place update with user-defined function

merge: atomic Read-Modify-Write

Iterator

Memory management

Hugepages

Tuning

Limitations

Key Patterns

Memory Allocation

Performance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`struct kv` and `struct kref`

The `whsafe` API

The `wormhole` API

Avoid blocking writers when using the `wormhole` API

The `whunsafe` API

`merge`: atomic Read-Modify-Write

Packages