8000 Replace custom `Softmax*` `Op`s with Aesara graphs · Issue #682 · aesara-devs/aesara · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Replace custom Softmax* Ops with Aesara graphs #682
Open
@brandonwillard

Description

@brandonwillard

Both the softmax and log_softmax graphs are easy to identify and replace by the numerical stable versions that shift by the max.

The issues I found concerned the gradients of both ops (as well as the gradient of SoftmaxGrad) which introduce new softmax terms and would also need the shifting by the max to become stable. These are difficult to match because they can have different patterns depending on which gradients are actually being requested.

You can see that the existing rewrites seem to concern mostly the gradients and the old Theano issue I linked (Theano/Theano#4452) was concerned about not having a rewrite to match the gradient of the softmax when the specialized Op was not being used from the beginning.

I also checked what would happen if softmax and log_softmax returned the numerically stable graph immediately, but the Aesara generated gradients were still unstable.

Originally posted by @ricardoV94 in #673 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0