8000 GitHub - pevnak/GradientBoost.jl: Gradient boosting framework for Julia.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

pevnak/GradientBoost.jl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GradientBoost

This package is a massive simplification of the original GradientBoost.jl package. I have decided to simplify it down to bottom by removing all the goodies? Why, I just needed a bare boosting for experiments, when I want to boost a some algorithm (like NNs.) I also updated an algorithm to use / be compatible with MLUtils.jl, LossFunctions.jl, use Optim.jl and use Zygote as a fallback for custom loss functions (ForwardDiff might be better here.)

An example of use

The package is designed no sprinkle boosting on top of your ML algorithm. As such, it does not implement any algorithm to learn classifiers inside boosting. A simple example with decision stumpscan be found in test_ml.jl. A more sophisticated examples is in in example/mutagenesis.jl, where we show how to boost Mill classifier for classifying structured data. In the rest of this readme, the example with decision stumpts is commented.

Let's start by importing libraries and defining some training data.

using Test
using GradientBoost
using LossFunctions

x = Float64[
  1 0 -1  0;
  0 1  0 -1]
y = [-1, 1, -1, 1]

Our classifier that we want to boost is a decision tree of length 1, called Decision Stump. The decision stump is simple, implementing a variant of a rule xᵢ ≥ τ ? +1 : -1, where xᵢ is value of i-th feature. We define the decision stump as a simple callable struct.

struct Stump{T}
  dim::Int
  τ::T
  s::Int
end

function (s::Stump)(x::AbstractVector)
  x[s.dim]  s.τ ? s.s : -s.s
end

function (s::Stump)(x::AbstractMatrix)
  vec(mapslices(s, x, dims = 1))
end

To use Stump as a learner inside boosting algorithm, we need to overload learner_fit and learner_predict functions. Using mutiple dispatch, we can specialize fitting of different loss functions and different learners. For the purpose of dispatch, we define StumpLearner to signal that we want to learn a Stump and overload the learner_fit as

struct StumpLearner end 

function GradientBoost.learner_fit(lf, learner::StumpLearner, x::AbstractMatrix, wy::Vector{<:Real})
  w = abs.(wy)
  y = sign.(wy)
  best_stump = Stump(1, mean(x[1,:]), 1)
  best_err = mean(w .* (y .!= best_stump(x)))
  for dim in axes(x,1)
    τs = 0.5(x[dim,2:end] + x[dim, 1:end-1])
    for τ in τs
      for s in [-1, +1]
        e = mean(w .* (y .!= Stump(dim, τ, s)(x)))
        if e < best_err 
          best_stump = Stump(dim, τ, s)
          best_err = e
        end
      end
    end
  end
  best_stump
end

Now, we define a function providing the prediction as

function GradientBoost.learner_predict(::Loss, ::StumpLearner, s::Stump, x)
  s(x)
end

Finally, the boosting is called as

gbl = GBBL(StumpLearner(); loss_function = ExpLoss, num_iterations=4, learning_rate=1, sampling_rate = 1)
model = fit(gbl, x, y)
predictions = GradientBoost.predict(model, x)
@asser 2(predictions .> 0) .- 1 == y

Some notes

  • I got rid of ML api, as it does not served my purpose.
  • The loss function has the signature loss(prediction, true_labels)
  • I would like to thank the author of the original GradientBoost.jl library. I just needed something supersimple.

References:

  • Friedman, Jerome H. "Greedy function approximation: a gradient boosting machine." Annals of Statistics (2001): 1189-1232.
  • Friedman, Jerome H. "Stochastic gradient boosting." Computational Statistics & Data Analysis 38.4 (2002): 367-378.
  • Hastie, Trevor, et al. The elements of statistical learning. Vol. 2. No. 1. New York: Springer, 2009.
  • Ridgeway, Greg. "Generalized Boosted Models: A guide to the gbm package." Update 1.1 (2007).
  • Pedregosa, Fabian, et al. "Scikit-learn: Machine learning in Python." The Journal of Machine Learning Research 12 (2011): 2825-2830.
  • Natekin, Alexey, and Alois Knoll. "Gradient boosting machines, a tutorial." Frontiers in neurorobotics 7 (2013).

About

Gradient boosting framework for Julia.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 100.0%
0