CrossEntropyLoss fail to detect the negative index #4552

YangFei1990 · 2023-02-02T00:34:13Z

🐛 Bug

The torch.nn.CrossEntropyLoss should only takes target that has each value with [0,num_class). However on XLA device it could take negative values and output results.

To Reproduce

Check the below script

import torch
from torch_xla.core import xla_model as xm
loss = torch.nn.CrossEntropyLoss()
labels = torch.LongTensor([-1, -1, 3])
preds = torch.ones(3, 5)
# print(loss(preds, labels)) # IndexError: Target -1 is out of bounds.

xla_device = xm.xla_device()
labels = labels.to(xla_device)
preds = preds.to(xla_device)
output = loss(preds, labels)
xm.mark_step()
print(output)  # expected index error, but printed: tensor(1.6094, device='xla:1')

On CPU it will throw error IndexError: Target -1 is out of bounds., however on XLA device it will output some tensor

Expected behavior

Same error on XLA device

Environment

Reproducible on XLA backend [CPU/TPU]: GPU
torch_xla version: 1.13

The text was updated successfully, but these errors were encountered:

YangFei1990 · 2023-02-02T00:37:35Z

I was trying to look into this issue myself. This cross_entropy_loss op defined in the torch native_functions.yaml is not defined in xla_native_functions.yaml and aten_xla_type.cpp. So by default it should fallback to CPU implementation. However it has different behavior on CPU and XLA device. How should I map this op to its XLA implementation?

JackCaoG · 2023-02-02T00:50:44Z

I think cross_entroy_loss was dispatch to nll_loss or something. You can do print a metrics following https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#get-a-metrics-report after executing a CrossEntropyLoss then check xla:: counters which indicates which pytorch/xla ir it got dispatched to.

YangFei1990 · 2023-02-02T01:11:01Z

Thanks Jack. Seems cross_entroy_loss is patched to multiple ops including _log_softmax and nll_loss_forward

Counter: xla::_copy_from
  Value: 3
Counter: xla::_log_softmax
  Value: 1
Counter: xla::empty_symint
  Value: 2
Counter: xla::nll_loss_forward
  Value: 1

How could I find this mapping relationship from code?

JackCaoG · 2023-02-02T01:13:34Z

I think it is _log_softmax then nll_loss_forward , the dispatcher logic lives upstream. You can add print in https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp for corresponding ops to inspect input shape, calling order etc if can compile pytorch/xla.

YangFei1990 · 2023-02-02T01:23:49Z

Could you elaborate more about that dispatcher logic lives upstream? I was reading this op-lowering-guide and my impression was that ops defined in native_functions.yaml will be mapped to xla_native_functions.yaml otherwise will fallback to CPU.

JackCaoG · 2023-02-02T01:26:42Z

Maybe https://github.com/pytorch/pytorch/wiki/PyTorch-dispatcher-walkthrough

ymwangg · 2023-02-02T03:13:23Z

@YangFei1990 It's related to how this op is lowered and I'm not sure xla can provide similar assertion at runtime. In this case, the label will be converted to one hot vector and the label with invalid index will become vector of all 0s.

xla/torch_xla/csrc/nll_loss.cpp

Lines 43 to 50 in e0eb2a7

    
           // Converts "indices" into a one-hot representation. "depth" is the size of the 
        
           // new axis to add. "axis" is the position at which to add the new axis. 
        
           // "on_value" and "off_value" represent the values to use for the on and off 
        
           // positions, respectively. If "ignore_index" is a valid class, it'll be 
        
           // considered off. 
        
           xla::XlaOp LabelsToOneHot(xla::XlaBuilder* builder, int64_t depth, int axis, 
        
                                     xla::XlaOp indices, xla::XlaOp on_value, 
        
                                     xla::XlaOp off_value, int ignore_index) {

ysiraichi · 2025-05-07T12:39:31Z

I was able to reproduce this on: c4b45a9 (Apr 28, 2025)

alanwaketan assigned JackCaoG Feb 15, 2023

ysiraichi added pytorch divergence XLA behavior doesn't match Pytorch eager frontend tracing Lazy Tensor tracing labels May 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CrossEntropyLoss fail to detect the negative index #4552

CrossEntropyLoss fail to detect the negative index #4552

CrossEntropyLoss fail to detect the negative index #4552

CrossEntropyLoss fail to detect the negative index #4552

Comments

🐛 Bug

To Reproduce

Expected behavior

Environment