This package provides an online estimation of distributional regression The main contribution is an online/incremental implementation of the generalized additive models for location, shape and scale (GAMLSS, see Rigby & Stasinopoulos, 2005) developed in Hirsch, Berrisch & Ziel, 2024.
Please have a look at the documentation or the example notebook.
We're actively working on the package and welcome contributions from the community. Have a look at the Release Notes and the Issue Tracker.
The main idea of distributional regression (or regression beyond the mean, multiparameter regression) is that the response variable
where
This allows us to specify very flexible models that consider the conditional behaviour of the variable's volatility, skewness and tail behaviour. A simple example for electricity markets is wind forecasts, which are skewed depending on the production level - intuitively, there is a higher risk of having lower production if the production level is already high since it cannot go much higher than "full load" and if, the turbines might cut-off. Modelling these conditional probabilistic behaviours is the key strength of distributional regression models.
Basic estimation and updating procedure:
import ondil
import numpy as np
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True)
# Model coefficients
equation = {
0 : "all", # Can also use "intercept" or np.ndarray of integers / booleans
1 : "all",
2 : "all",
}
# Create the estimator
online_gamlss_lasso = ondil.OnlineGamlss(
distribution=ondil.DistributionT(),
method="lasso",
equation=equation,
fit_intercept=True,
ic="bic",
)
# Initial Fit
online_gamlss_lasso.fit(
X=X[:-11, :],
y=y[:-11],
)
print("Coefficients for the first N-11 observations \n")
print(online_gamlss_lasso.beta)
# Update call
online_gamlss_lasso.update(
X=X[[-11], :],
y=y[[-11]]
)
print("\nCoefficients after update call \n")
print(online_gamlss_lasso.beta)
# Prediction for the last 10 observations
prediction = online_gamlss_lasso.predict(
X=X[-10:, :]
)
print("\n Predictions for the last 10 observations")
# Location, scale and shape (degrees of freedom)
print(prediction)
The package is available from pypi - do pip install ondil
and enjoy.
ondil
is designed to have minimal dependencies. We rely on python>=3.10
, numpy
, numba
and scipy
in a reasonably up-to-date versions.
- Simon Hirsch, University of Duisburg-Essen & Statkraft
- Jonathan Berrisch, University of Duisburg-Essen
- Florian Ziel, University of Duisburg-Essen
rolch
(Regularized Online Learning for Conditional Heteroskedasticity) was the original name of this package, but we decided to rename it to ondil
(Online Distributional Learning) to better reflect its purpose and functionality, since conditional heteroskedasticity (=non constant variance) is just one of the many applications for distributional regression models that can be estimated with this package.
We welcome every contribution from the community. Feel free to open an issue if you find bugs or want to propose changes.
We're still in an early phase and welcome feedback, especially on the usability and "look and feel" of the package. Secondly, we're working to port distributions from the R
-GAMLSS package and welcome according PRs.
To get started, just create a fork and get going. We will modularize the code over the next versions and increase our testing coverage. We use ruff
and black
as formatters.
Simon is employed at Statkraft and gratefully acknowledges support received from Statkraft for his PhD studies. This work contains the author's opinion and not necessarily reflects Statkraft's position.
- Clone this repo.
- Install the necessary dependencies from the
requirements.txt
usingconda create --name <env> --file requirements.txt
. - Run
pip install .
optionally using--force
or--force --no-deps
to ensure the package is build from the updated wheels. If you want to 100% sure no cached wheels are there or you need the tarball, runpython -m build
before installing. - Enjoy.