VGT methodology is designed to alleviate the high computational cost of Voronoi diagram and Delaunay tessellation constructions in high dimensions. This can allow some new applications of geometric methods which utilize these structures on more complex data.
The project makes use of an idea of a graph traversal, or marching, over a Voronoi graph (i.e. 1-skeleton of a Voronoi diagram) without its explicit computation, while extracting required information about the dual of the visited vertices.
We demonstrate the first applications of this methodology in our KDD'20 submission
Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation.
Currently planned features are also included in their preliminary order of appearance.
- General interface for interactions with the Voronoi graph (
VoronoiGraph.{cpp|h}
) - Delaunay k-skeleton extraction (
approx_delaunay.cpp
) - Piecewise-linear interpolation (
interpolate.cpp
) - Geometric density estimation
- Power diagrams (i.e. weighted Voronoi diagrams)
- Non-Euclidean/Riemannian metric spaces (e.g. hyperspherical)
Also, vgt/extra
contains a few helper scripts for some experiments from the paper above.
-
VoroCrust (https://vorocrust.sandia.gov/)
A way of obtaining a Delaunay graph approximation via an non-decreasing subgraph sequence in arbitrarily high dimensions without a full explicit construction was first introduced by Scott A. Mitchell, Mohamed S. Ebeida et al. in Spoke-Darts for High-Dimensional Blue-Noise Sampling. In the referenced work, graph approximations have their edges sampled according to their importance, i.e. angular sizes, while the VGT performs a close-to-uniform sampling of simplices, independent from the Voronoi diagram geometry.
-
Qhull (http://www.qhull.org/)
Qhull has always been the standard tool to compute Voronoi and Delaunay tessellations fast and explicitly in arbitrary dimensions. The exact computations of Qhull will likely be more advantageous when the dimensionality of data is small (e.g. up to 6 dimensions) compared to the VGT. The main profit of the VGT methodology comes in medium-dimensional data, when full reconstructions are no longer feasible or require excessive amount of time.
- C++17 (g++ compiler), cmake, make
- zlib (required by cnpy, tested on version 1.2.11)
- Eigen3 (linear algebra, tested on version 3.3.7)
- (Recommended) OpenMP (enable parallel computations)
As an example, the following line should install all dependencies on a clean Ubuntu 20.04 system:
sudo apt install g++ make cmake zlib1g-dev libeigen3-dev libomp-dev
Already included and do not require installation:
- cnpy to read numpy files in C++
- argparse for command line argument parsing
- tqdm for a progress line bar during the computations
A "standard cmake routine" can be applied to compile the code from sources:
mkdir build && cd build
cmake ..
make
You can choose to pass additional preprocessor parameters to cmake to optimize your code with a string
-DCMAKE_CXX_FLAGS="-D<key>=<value>"
. The following keys are available:
DIMENSIONALITY
If you have a fixed dimensionality of your data, then, following the suggestions from Eigen about fixed size matrices, you can compile the sources for that specific dimensionality to potentially speed up the computations. Default dimensionality used isEigen::Dynamic
.FTYPE
By default, all computations are performed inlong double
. However, if you want to use some other floating-point type you can pass the corresponding flag to cmake, for example-DFTYPE=double
. Be prepared that by usingdouble
orfloat
, the numeric instability may drastically increase in high-dimensional spaces. You can control it by looking at the number of failed validations after running an algorithm.You may also opt to use multiple-precision arithmetics libraries, such as MPIR or GMP. Such libraries are not included in this package, so you would need to include them manually and update the line
using ftype = long double;
invgt/utils.h
with your preferred floating-point type. Using these libraries would allow one to achieve a desired numeric stability at the cost of a slower running time.
All executables have a -h
command line argument to show an up-to-date information about the available command line arguments.
vgt/approx_delaunay [options] data.npy
Argument | Description |
---|---|
data.npy |
npy NxD data matrix of 32-bit floats |
--dim <int:2> |
maximum dimensionality of extracted Delaunay simplices |
--steps <int:1000> |
number of steps to perform in a random walk from each starting vertex; a non-positive value would instead correspond to a full walk / complete graph search |
--noclosure |
flag to output only k-dimensional Delaunay simplices without their closure |
--out <str:"output.txt"> |
output filename for a txt file containing Delaunay simplices; each line describes a simplex in a form d v_0 v_1 ... v_d [f] , where v_i are indices of data points that form the simplex and [f] is the filtration value of the simplex, if it was computed |
--vout <str:""> |
optional output filename for a txt file containing visited Voronoi vertices; each line describes a vertex in a form of its dual d v_0 v_1 ... v_d |
--seed <int:239> |
random seed |
--njobs <int:1> |
number of parallel threads (requires OpenMP) |
--strategy <str:"brute_force"> |
ray sampling strategy, available values: {"brute_force", "bin_search"} |
vgt/interpolate [options] data.npy query.npy
Argument | Description |
---|---|
data.npy |
numpy NxD data matrix of 32-bit floats |
query.npy |
numpy MxD data matrix of 32-bit floats |
--values <str:""> |
npy Nx1 matrix of 32-bit floats |
--out <str:"interpolated.npz"> |
output filename for an npz data file, format described below |
--seed <int:239> |
random seed |
--njobs <int:1> |
number of parallel threads (requires OpenMP) |
--strategy <str:"brute_force"> |
ray sampling strategy, available values: {"brute_force", "bin_search"} |
The output npz file contains the following matrices:
indices ,shape=(L, )
indices of query points for which the interpolation was successful (points must lie inside the convex hull of the data)simplices, shape=(L, D+1)
each line contains (D+1) indices of data points that are vertices of a Delaunay simplex containing the corresponding query pointcoefficients, shape=(L, D+1)
contains barycentric coordinates of query points, based on data point indices fromsimplices
estimates, shape=(L,)
(if values are provided) interpolation values of query points
@inproceedings{polianskii2020voronoi,
title={Voronoi Graph Traversal in High Dimensions with Applications to Topological Data Analysis and Piecewise Linear Interpolation},
author={Polianskii, Vladislav and Pokorny, Florian T},
booktitle={Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
pages={2154--2164},
year={2020}
}
The project is available under the MIT license.