8000 BUG: Incompatibility with pandas Int64 dtype (int64 with NAs) · Issue #4011 · shap/shap · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
BUG: Incompatibility with pandas Int64 dtype (int64 with NAs) #4011
Open
@kpatucha

Description

@kpatucha

Issue Description

While using pandas dtype Int64 which supports NAs (unlike numpy int64) I get an error in TreeExplainer with background datasets. The type of error displayed seems to depend on the model and the data.

In the example provided commenting line X = X.convert_dtypes() which converts dtypes to pandas dtypes fixes the issue. However, this does not allow for passing features that are int but also have NAs.

Minimal Reproducible Example

import numpy as np
import pandas as pd
from catboost import CatBoostClassifier
from xgboost import XGBClassifier
import shap

rng = np.random.default_rng(42)
x1 = rng.standard_normal(100_000)
x2 = rng.integers(-100,100, 100_000)
X = pd.DataFrame({"x1": x1, "x2": x2})
X = X.convert_dtypes()
# X.loc[X["x2"]>50,"x2"] = None
X.info()

y = (X.sum(axis=1)>0)*1

# model = CatBoostClassifier(verbose=False)
# model.fit(X,y)

model = XGBClassifier(verbose=False)
model.fit(X,y)

explainer = shap.TreeExplainer(model, X, feature_perturbation="interventional")

Traceback

1. with `NA`s (uncommented line `X.loc[X["x2"]>50,"x2"] = None`) I get:

TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

2. for `XGBClassifier` I get:
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

3. for `CatBoostClassifier` I get:

AttributeError: 'TreeEnsemble' object has no attribute 'values'

Expected Behavior

No response

Bug report checklist

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest release of shap.
  • I have confirmed this bug exists on the master branch of shap.
  • I'd be interested in making a PR to fix this bug

Installed Versions

0.47.0

Metadata

Metadata

Assignees

Labels

bugIndicates an unexpected problem or unintended behaviourhelp wantedIndicates that a maintainer wants help on an issue or pull request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0