8000 GitHub - Ruhallah93/superior-scoring-rules: Scoring rules like the Brier Score (Mean Squared Error, Quadratic Score) and Log Loss (Cross-Entropy, Negative Log-Likelihood, Logarithmic Score) can favor incorrect predictions. To address this limitation, the Probabilistic Brier Score (PBS) and Probabilistic Logarithmic Loss (PLL) have been introduced for probabilistic classifiers.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

Scoring rules like the Brier Score (Mean Squared Error, Quadratic Score) and Log Loss (Cross-Entropy, Negative Log-Likelihood, Logarithmic Score) can favor incorrect predictions. To address this limitation, the Probabilistic Brier Score (PBS) and Probabilistic Logarithmic Loss (PLL) have been introduced for probabilistic classifiers.

License

Notifications You must be signed in to change notification settings

Ruhallah93/superior-scoring-rules

Repository files navigation

Superior Scoring Rules: Enhanced Calibrated Scoring Rules for Probabilistic Evaluation

PyPI Version License

📊 PBS and PLL are strictly proper scoring rules and superior evaluation metrics for probabilistic classifiers, fixing flaws in Brier Score (MSE) and Log Loss (Cross-Entropy). Strictly proper, consistent, and better for model selection, early stopping, and checkpointing.


Table of Contents

  1. Motivation

  2. Limitations of Traditional Metrics

  3. Penalized Scoring Rules

  4. Quick Start

  5. Project Structure

  6. Paper & Citation

  7. Contributing

  8. License


Motivation

In many high-stakes applications, confidence calibration is critical. Traditional accuracy-based metrics (Accuracy, F1) ignore prediction confidence. Consider:

  • Cancer Diagnosis: Differentiating 51% vs. 99% confidence in malignancy
  • ICU Triage: Overconfident mispredictions risk patient safety
  • Autonomous Vehicles: Handling uncertainties about obstacles
  • Financial Risk Modeling: Pricing and investment decisions
  • Security Threat Detection: High-confidence false negatives

Accuracy or F1 score alone cannot capture this nuance.

Limitations of Traditional Metrics

While Brier Score (Mean Squared Error, MSE, Quadratic Score) and Log Loss (Cross-Entropy, Negative Log-Likelihood, NLL, Logarithmic Score) are strictly proper scoring rules, they can still favor incorrect, overconfident predictions over more calibrated, correct ones.

Case True Class Prediction Brier Score Log Loss Notes
A [0,1,0] [0.33,0.34,0.33] 0.6534 0.4685 ✅ Correct, but low confidence
B [0,1,0] [0.51,0.49,0.00] 0.5202 0.3098 ❌ Incorrect, but "better" score

Traditional scores prefer B over A, violating the principle that correct predictions should always be rewarded.

Penalized Scoring Rules

We introduce a penalty term that ensures any incorrect prediction is scored worse than any correct one.

Definitions

Let y be the one‑hot true vector, p the predicted probability vector, and c the number of classes. Define the set of predictions:

$$\xi = \{\,p \mid \arg\max p \neq \arg\max y\}\quad\text{(incorrect predictions)}$$

Formulas

Then the Penalized Brier Score (PBS) is:

$$S_{PBS}(p,i) = \sum_{i=1}^{c}(y_i-p_i)^2 + \begin{cases} \frac{c-1}{c} & p \in \xi\\ 0 & \text{otherwise} \end{cases}$$

And the Penalized Logarithmic Loss (PLL) is:

$$S_{PLL}(p,i) = - \sum_{i=1}^{c} y_i \log(p_i) - \begin{cases} \log (\frac{1}{c}) & p \in \xi\\ 0 & \text{otherwise} \end{cases}$$

Implement 8000 ation

Penalized Brier Score (PBS)

def pbs(y, q):
    """
    Computes Penalized Brier Score.
    
    Args:
        y_true: Ground truth (one-hot encoded), shape [batch_size, num_classes]
        y_pred: Predicted probabilities, shape [batch_size, num_classes]
        
    Returns:
        Mean PBS across batch
    """
    y = tf.cast(y, tf.float32)
    c = y.get_shape()[1]

    # Calculate the payoff term
    ST = tf.math.subtract(q, tf.reduce_sum(tf.where(y == 1, q, y), axis=1)[:, None])
    ST = tf.where(ST < 0, tf.constant(0, dtype=tf.float32), ST)
    payoff = tf.reduce_sum(tf.math.ceil(ST), axis=1)
    M = (c - 1) / (c)
    payoff = tf.where(payoff > 0, tf.constant(M, dtype=tf.float32), payoff)
    
    # Brier score + penalty
    brier = tf.math.reduce_mean(tf.math.square(tf.math.subtract(y, q)), axis=1)
    return tf.math.reduce_mean(brier + payoff)

Penalized Logarithmic Loss (PLL)

def pll(y, q):
    """
    Computes Penalized Logarithmic Loss.
    
    Args:
        y_true: Ground truth (one-hot encoded)
        y_pred: Predicted probabilities
        
    Returns:
        Mean PLL across batch
    """
    y = tf.cast(y, tf.float32)
    c = y.get_shape()[1]

    # Calculate the payoff term
    ST = tf.math.subtract(q, tf.reduce_sum(tf.where(y == 1, q, y), axis=1)[:, None])
    ST = tf.where(ST < 0, tf.constant(0, dtype=tf.float32), ST)
    payoff = tf.reduce_sum(tf.math.ceil(ST), axis=1)
    M = math.log(1 / c)
    payoff = tf.where(payoff > 0, tf.constant(M, dtype=tf.float32), payoff)
    log_loss = tf.keras.losses.categorical_crossentropy(y, q)

    # Cross-entropy - penalty
    ce_loss = tf.cast(log_loss, tf.float32)
    return tf.math.reduce_mean(ce_loss - payoff)

Quick Start

Installation

Install via PyPI:

pip install superior-scoring-rules

Basic Usage

import tensorflow as tf
from superior_scoring_rules import pbs, pll

# Sample data (batch_size=3, num_classes=4)
y_true = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
y_pred = tf.constant([[0.9, 0.05, 0.05, 0], 
                     [0.1, 0.8, 0.05, 0.05],
                     [0.1, 0.1, 0.1, 0.7]])

print("PBS:", pbs(y_true, y_pred).numpy())
print("PLL:", pll(y_true, y_pred).numpy())

Callbacks for Early Stopping & Checkpointing

class PBSCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        logs['val_pbs'] = pbs(self.validation_data[1],
                              self.model.predict(self.validation_data[0]))

model.fit(...,
    callbacks=[
        PBSCallback(),
        tf.keras.callbacks.EarlyStopping(monitor='val_pbs', patience=5, mode='min'),
        tf.keras.callbacks.ModelCheckpoint('best.h5', monitor='val_pbs', save_best_only=True)
    ]
)

Project Structure

Below is an overview of the main files and folders:

├── Superior_Scoring_Rules.ipynb   # Implementation & analysis  
├── superior_scoring_rules.py      # PBS & PLL functions  
├── README.md                      # This file  
├── history/                       # Statistical analysis plots  
└── hyperparameters-tuning/        # Tuning results  

Paper & Citation

@article{ahmadian2025superior,
  title={Superior scoring rules for probabilistic evaluation of single-label multi-class classification tasks},
  author={Ahmadian, Rouhollah and Ghatee, Mehdi and Wahlstr{\"o}m, Johan},
  journal={International Journal of Approximate Reasoning},
  pages={109421},
  year={2025},
  publisher={Elsevier}
}

Contributing

  • 🐛 Report bugs via Issues

  • 💡 Suggest improvements via Pull Requests

  • ⭐️ Star the repository if you find it useful!

License

This project is licensed under the BSD License.

About

Scoring rules like the Brier Score (Mean Squared Error, Quadratic Score) and Log Loss (Cross-Entropy, Negative Log-Likelihood, Logarithmic Score) can favor incorrect predictions. To address this limitation, the Probabilistic Brier Score (PBS) and Probabilistic Logarithmic Loss (PLL) have been introduced for probabilistic classifiers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0