8000 CrossEntropyLoss fail to detect the negative index · Issue #4552 · pytorch/xla · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

CrossEntropyLoss fail to detect the negative index #4552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
YangFei1990 opened this issue Feb 2, 2023 · 8 comments
Open

CrossEntropyLoss fail to detect the negative index #4552

YangFei1990 opened this issue Feb 2, 2023 · 8 comments
Assignees
Labels
pytorch divergence XLA behavior doesn't match Pytorch eager frontend tracing Lazy Tensor tracing

Comments

@YangFei1990
Copy link
Contributor

🐛 Bug

The torch.nn.CrossEntropyLoss should only takes target that has each value with [0,num_class). However on XLA device it could take negative values and output results.

To Reproduce

Check the below script

import torch
from torch_xla.core import xla_model as xm
loss = torch.nn.CrossEntropyLoss()
labels = torch.LongTensor([-1, -1, 3])
preds = torch.ones(3, 5)
# print(loss(preds, labels)) # IndexError: Target -1 is out of bounds.

xla_device = xm.xla_device()
labels = labels.to(xla_device)
preds = preds.to(xla_device)
output = loss(preds, labels)
xm.mark_step()
print(output)  # expected index error, but printed: tensor(1.6094, device='xla:1')

On CPU it will throw error IndexError: Target -1 is out of bounds., however on XLA device it will output some tensor

Expected behavior

Same error on XLA device

Environment

  • Reproducible on XLA backend [CPU/TPU]: GPU
  • torch_xla version: 1.13
@YangFei1990
Copy link
Contributor Author

I was trying to look into this issue myself. This cross_entropy_loss op defined in the torch native_functions.yaml is not defined in xla_native_functions.yaml and aten_xla_type.cpp. So by default it should fallback to CPU implementation. However it has different behavior on CPU and XLA device. How should I map this op to its XLA implementation?

@JackCaoG
Copy link
Collaborator
JackCaoG commented Feb 2, 2023

I think cross_entroy_loss was dispatch to nll_loss or something. You can do print a metrics following https://github.com/pytorch/xla/blob/master/TROUBLESHOOTING.md#get-a-metrics-report after executing a CrossEntropyLoss then check xla:: counters which indicates which pytorch/xla ir it got dispatched to.

@YangFei1990
Copy link
Contributor Author

Thanks Jack. Seems cross_entroy_loss is patched to multiple ops including _log_softmax and nll_loss_forward

Counter: xla::_copy_from
  Value: 3
Counter: xla::_log_softmax
  Value: 1
Counter: xla::empty_symint
  Value: 2
Counter: xla::nll_loss_forward
  Value: 1

How could I find this mapping relationship from code?

@JackCaoG
Copy link
Collaborator
JackCaoG commented Feb 2, 2023

I think it is _log_softmax then nll_loss_forward , the dispatcher logic lives upstream. You can add print in https://github.com/pytorch/xla/blob/master/torch_xla/csrc/aten_xla_type.cpp for corresponding ops to inspect input shape, calling order etc if can compile pytorch/xla.

@YangFei1990
Copy link
Contributor Author

Could you elaborate more about that dispatcher logic lives upstream? I was reading this op-lowering-guide and my impression was that ops defined in native_functions.yaml will be mapped to xla_native_functions.yaml otherwise will fallback to CPU.

@JackCaoG
Copy link
Collaborator
JackCaoG commented Feb 2, 2023

@ymwangg
Copy link
Contributor
ymwangg commented Feb 2, 2023

@YangFei1990 It's related to how this op is lowered and I'm not sure xla can provide similar assertion at runtime. In this case, the label will be converted to one hot vector and the label with invalid index will become vector of all 0s.

// Converts "indices" into a one-hot representation. "depth" is the size of the
// new axis to add. "axis" is the position at which to add the new axis.
// "on_value" and "off_value" represent the values to use for the on and off
// positions, respectively. If "ignore_index" is a valid class, it'll be
// considered off.
xla::XlaOp LabelsToOneHot(xla::XlaBuilder* builder, int64_t depth, int axis,
xla::XlaOp indices, xla::XlaOp on_value,
xla::XlaOp off_value, int ignore_index) {

@ysiraichi ysiraichi added pytorch divergence XLA behavior doesn't match Pytorch eager frontend tracing Lazy Tensor tracing labels May 7, 2025
@ysiraichi
Copy link
Collaborator
ysiraichi commented May 7, 2025

I was able to reproduce this on: c4b45a9 (Apr 28, 2025)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pytorch divergence XLA behavior doesn't match Pytorch eager frontend tracing Lazy Tensor tracing
Projects
None yet
Development

No branches or pull requests

4 participants
0