An interface to Knockoffs.jl from the Python programming language. knockoffspy
provides unique high performance methods for sampling various model-X knockoffs and ships with built-in routines for variable selection. Much of the functionality are unique and allow for orders of magnitude speedup over conventional methods. 'knockoffspy' attaches a Python interface onto the package, allowing seamless use of this tooling by Python users.
To install knockoffspy
, use pip
pip install knockoffspy
Then in the python interpreter,
>>> import knockoffspy
>>> knockoffspy.install()
This will install the Knockoffs.jl
package and all the Julia dependencies that it needs.
Import the package as
from knockoffspy import ko
The general flow for using the package is to follow exactly as would be done in Julia, except add ko.
in front of function calls. Most of the commands will work without any modification. Thus the Knockoffs.jl documentation is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.
Most of the commands of Knockoffs.jl
will work in python without any modification, just add ko.
in front of function calls. Thus the Knockoffs.jl documentation is the main in-depth documentation for this package. Below we will show how to translate these docs to Python code.
Lets simulate X ~ N(0, Sigma)
8000
where Sigma
is a symmetric Toesplitz matrix. Here we assume every 5 variables form a group
from knockoffspy import ko
from scipy import linalg
import numpy as np
# generate data
n = 1000 # number of samples
p = 1000 # number of covariates
m = 1 # number of knockoffs to generate per feature
groups = np.repeat(np.arange(0,200,1), 5)
Sigma = linalg.toeplitz([0.7**i for i in range(1, p+1)])
mu = np.zeros(p)
X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=(n,))
We generate model-X group knockoffs as follows
solver = "maxent" # Maximum entropy solver, other choices include "mvr", "sdp", "equi"
result = ko.modelX_gaussian_group_knockoffs(X, solver, groups, mu, Sigma, verbose=True)
Maxent initial obj = -2087.6929364666807
Iter 1 (PCA): obj = -1607.4371641119483, δ = 0.4806840533479427, t1 = 0.28, t2 = 0.46
Iter 2 (CCD): obj = -1589.951838589172, δ = 0.046537146748581976, t1 = 0.42, t2 = 1.28, t3 = 0.0
Iter 3 (PCA): obj = -1570.44910152802, δ = 0.32338244109034703, t1 = 0.67, t2 = 1.74
Iter 4 (CCD): obj = -1562.5471454507458, δ = 0.028462155141072386, t1 = 0.81, t2 = 2.56, t3 = 0.0
Iter 5 (PCA): obj = -1557.1393033537286, δ = 0.124560844581473, t1 = 1.04, t2 = 2.99
Iter 6 (CCD): obj = -1552.4489159484508, δ = 0.020754156607442897, t1 = 1.18, t2 = 3.81, t3 = 0.01
Iter 7 (PCA): obj = -1549.810615656943, δ = 0.07012194156368799, t1 = 1.43, t2 = 4.27
Iter 8 (CCD): obj = -1547.020766696055, δ = 0.015614065719368509, t1 = 1.56, t2 = 5.09, t3 = 0.01
Iter 9 (PCA): obj = -1545.531088575216, δ = 0.0511701859313534, t1 = 1.82, t2 = 5.58
Iter 10 (CCD): obj = -1543.8817717502163, δ = 0.013019011861975537, t1 = 1.95, t2 = 6.4, t3 = 0.01
Iter 11 (PCA): obj = -1542.9960966431295, δ = 0.04029486159842148, t1 = 2.23, t2 = 6.87
Iter 12 (CCD): obj = -1542.0192637205867, δ = 0.011386766045749418, t1 = 2.36, t2 = 7.69, t3 = 0.01
Iter 13 (PCA): obj = -1541.4898664708514, δ = 0.03310222247410438, t1 = 2.61, t2 = 8.17
Iter 14 (CCD): obj = -1540.9005704168408, δ = 0.010234115029284592, t1 = 2.74, t2 = 8.99, t3 = 0.01
Iter 15 (PCA): obj = -1540.5789961008802, δ = 0.027573434751233035, t1 = 3.5, t2 = 9.5
Iter 16 (CCD): obj = -1540.221696231743, δ = 0.009352305423352147, t1 = 3.62, t2 = 10.33, t3 = 0.01
Iter 17 (PCA): obj = -1540.0257796341718, δ = 0.02345771973192861, t1 = 4.15, t2 = 10.83
Iter 18 (CCD): obj = -1539.806801955503, δ = 0.008629474061316174, t1 = 4.28, t2 = 11.68, t3 = 0.02
Iter 19 (PCA): obj = -1539.689234490581, δ = 0.020162878516809316, t1 = 4.81, t2 = 12.21
Iter 20 (CCD): obj = -1539.5543495056934, δ = 0.007951906306609491, t1 = 4.93, t2 = 13.03, t3 = 0.02
To extract the knockoffs, S matrix, and the final objective as
Xko = result.Xko
S = result.S
obj = result.obj
If you use this software in a research paper, please cite our paper. Scripts to reproduce the results featured in our paper can be found here.