Open
Description
Issue Description
While using pandas dtype Int64
which supports NA
s (unlike numpy int64
) I get an error in TreeExplainer
with background datasets. The type of error displayed seems to depend on the model and the data.
In the example provided commenting line X = X.convert_dtypes()
which converts dtypes to pandas dtypes fixes the issue. However, this does not allow for passing features that are int but also have NAs.
Minimal Reproducible Example
import numpy as np
import pandas as pd
from catboost import CatBoostClassifier
from xgboost import XGBClassifier
import shap
rng = np.random.default_rng(42)
x1 = rng.standard_normal(100_000)
x2 = rng.integers(-100,100, 100_000)
X = pd.DataFrame({"x1": x1, "x2": x2})
X = X.convert_dtypes()
# X.loc[X["x2"]>50,"x2"] = None
X.info()
y = (X.sum(axis=1)>0)*1
# model = CatBoostClassifier(verbose=False)
# model.fit(X,y)
model = XGBClassifier(verbose=False)
model.fit(X,y)
explainer = shap.TreeExplainer(model, X, feature_perturbation="interventional")
Traceback
1. with `NA`s (uncommented line `X.loc[X["x2"]>50,"x2"] = None`) I get:
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
2. for `XGBClassifier` I get:
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
3. for `CatBoostClassifier` I get:
AttributeError: 'TreeEnsemble' object has no attribute 'values'
Expected Behavior
No response
Bug report checklist
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest release of shap.
- I have confirmed this bug exists on the master branch of shap.
- I'd be interested in making a PR to fix this bug
Installed Versions
0.47.0