Description
Problem: Error in multiregression with text data
Version: 1.1.1
Operating System: Windows
CPU: Intel Core-i7 x86/64
GPU: None
I am doing a multi-target regression using Catboost. All the target columns have continuous numeric values. Two columns in the training data are in text format (responses of the participants in a survey). Following is the code:
target_col = ['SISU_LP', 'SISU_AM', 'SISU_EP', 'SISU_HR', 'SISU_HS', 'SISU_HO']
text_cols=['surrender','feelings']
X = data.drop(columns=target_col)
y = data[target_col]
params = {'learning_rate': 0.1, 'depth': 6, 'loss_function': 'MultiRMSE', 'eval_metric': 'MultiRMSE'}
model = CatBoostRegressor(**params)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=1)
pool_train = Pool(data=X_train, label=y_train,text_features = text_cols)
pool_test = Pool(data=X_test, label=y_test, text_features = text_cols)
model.fit(pool_train, eval_set=pool_test, use_best_model=True)
I am getting the following error: Attempt to use multi-dimensional target as one-dimensional
.
The source of the error is model.fit().
If I drop the text columns, the error goes away. Similarly, if I replace text_features
in Pool
with cat_features
, the error goes away. But I am not sure if using cat_features
instead of text_features
is appropriate here. The text columns contain sentences, not the categorial data.