8000 New, extensible Differential implementation (autodiff #4) by sritchie · Pull Request #221 · sicmutils/sicmutils · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

New, extensible Differential imp 8000 lementation (autodiff #4) #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Jan 31, 2021

Conversation

sritchie
Copy link
Member
@sritchie sritchie commented Dec 28, 2020

This new implementation of Differential is quite similar to the old, with some performance improvements born of lots of staring.

I was also able to get the bug fix in that I mentioned to you, @littleredcomputer - I describe this at length in the tests for #223 , but this was a newer case of perturbation confusion that's handled in the extract-tangent implementation here.

The protocol that this PR adds sets us up for the "Amazing Bug" fix in #223.

Additional notes from the CHANGELOG:

  • New, literate Differential implementation lives at at
    sicmutils.differential (New, extensible Differential implementation (autodiff #4) #221) (see this
    page
    for a
    readable version.) Notable changes to the original impl at
    sicmutils.calculus.derivative include:

    • We've changed our terminology dlefrom GJS's finite-part,
      infinitesimal-part, make-x+dx to the more modern primal-part,
      tangent-part, bundle-element that the Automatic Differentiation community has
      now adopted.

    • A new sicmutils.differential.IPerturbed protocol makes it possible to
      extend the Automatic Differentiation (AD) system to be able to handle
      different Functor-shaped return values, like Java or JS lists and objects.
      See the cljdoc page on Automatic
      Differentiation

      for more detail.

    • sicmutils.differential/{lift-1,lift-2,lift-n} allow you to make custom
      operations differentiable, provided you can supply a derivative.

    • Differential implements sicmutils.function/arity, IFn, and can be
      applied to arguments if its coefficients are function values.

    • New compare and equiv implementations allow Differential instances to
      compare themselves with other objects using only their primal parts; this
      makes it possible to use functions like <=, >, = to do control flow
      during automatic differentiation. (Use compare-full and eq if you want
      to do full equality comparisons on primal and tangent components.)

    • related, g/abs is now implemented for Differential instances, making
      this function available in functions passed to D.

    • proper numerical?, one? and identity? implementations. The latter two
      only respond true if there are NO tangent components; This means that
      one? and (= % 1) will not agree.

    • The new implementation fixes a subtle bug with nested, higher order
      automatic differentiation - it's too subtle for the CHANGELOG, so please the
      "amazing" bug sections in sicmutils.calculus.derivative-test for proper
      exposition.

  • sicmutils.generic/partial-derivative gains a Keyword extension, so it can
    respond properly to :name and :arity calls (New, extensible Differential implementation (autodiff #4) #221).

Notes

In theory, lift-1, lift-2 and lift-n are general enough that you could use the whole package to augment ANY arithmetic, not just the sicmutils arithmetic, with forward mode AD. In reality, I realize that we've baked in reliance on g/+ and g/* to the implementations of Differential addition and multiplication. If we let the user bind those, or pass those in to lift-1 and friends and threaded them through, you could augment just the core functions, for example, without relying on the whole generic tower we've built here.

Some of the resources I used, studying up for this implementation:

The bug in the paper arises from the question, addressed in my exposition a bit - How do you extend derivative so that its range can include functions? IE, can I take the derivative of a higher-order function? This is a more complicated problem than allowing functors like Vector or Dict into the range, since the function expects an input too.

What if you wanted to take derivatives of fns that returned core.async channels?

@sritchie sritchie changed the base branch from master to sritchie/diff_three December 28, 2020 18:52
@codecov-io
Copy link
codecov-io commented Dec 28, 2020

Codecov Report

❗ No coverage uploaded for pull request base (sritchie/compare_fixes@8d2e72d). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                    Coverage Diff                    @@
##             sritchie/compare_fixes     #221   +/-   ##
=========================================================
  Coverage                          ?   83.96%           
=========================================================
  Files                             ?       80           
  Lines                             ?     8470           
  Branches                          ?      456           
=========================================================
  Hits                              ?     7112           
  Misses                            ?      902           
  Partials                          ?      456           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d2e72d...53ba17a. Read the comment docs.

@sritchie sritchie added the Automatic Differentiation Bugs and issues related to Automatic Differentiation label Jan 4, 2021
@sritchie sritchie changed the title New, extensible Differential implementation, properly handles "amazing bug" (autodiff #4) New, extensible Differential implementation (autodiff #4) Jan 8, 2021
@sritchie
Copy link
Member Author

@littleredcomputer I've exported this PR as a literate document, if you prefer reading it that way: https://samritchie.io/dual-numbers-and-automatic-differentiation/

@sritchie sritchie changed the base branch from sritchie/diff_three to sritchie/compare_fixes January 13, 2021 16:29
Or, what is described as "Alexey's Amazing Bug" in the Scmutils source
code (and further described by Oleksandr Manzyuk [here][OM]). This should
be straightforward to fix but hasn't been necessary for the work so far.
Most of the code in the early sections of FDG will work, but this cannot be
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're getting there!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/

;; The [[IPerturbed]] protocol accomplishes this, along with two other functions
;; that we'll use later:

(defprotocol IPerturbed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@littleredcomputer: this is going to make it easy to extend D to be able to handle fns that returns java lists, js objects and arrays, quaternions or any other container someone wants to handle.

;; If you want to compare two instances using their full term lists,
;; See [[eq]].
#?(:clj (equals [a b] (equiv a b)))
#?(:cljs (valueOf [this] (.valueOf (finite-term this))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a big discovery for me, and triggered #236. This is all we need in CLJS for differentials to compare properly with non-differentials, using <, > and friends.

@sritchie sritchie changed the base branch from sritchie/compare_fixes to master January 31, 2021 14:07
@sritchie sritchie merged commit efb342d into master Jan 31, 2021
Copy link
Collaborator
@littleredcomputer littleredcomputer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is monumental

Or, what is described as "Alexey's Amazing Bug" in the Scmutils source
code (and further described by Oleksandr Manzyuk [here][OM]). This should
be straightforward to fix but hasn't been necessary for the work so far.
Most of the code in the early sections of FDG will work, but this cannot be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/

;; by returning `(mapv extract-tangent v)`
;;
;; The way that [[Differential]] instances interact with container types in
;; SICMUtils makes it easy for these captures to occur all over. Whenever we
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only use of the word "capture" in this (lovely) essay so I don't think it's obvious why this is good. What is wrong with having V + εW where V and W are vectors we may wonder? Could it be argued that that would make extract trivial and we wouldn't need IPerturbed? Earlier you do define dual numbers as having real components though and so that could be reason enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm glad you called this out. two thoughts this triggers:

  1. There's nothing wrong with what you're suggesting in principle, especially if we added a clear directional derivative operator (you'd kick the whole process off with V + εW). BUT you still can't stop someone from taking a differential like 5 + ε1 and putting it inside a collection! So you still need to handle both. (Concretely I'm going to remove the paragraph you're flagging, since the example above is enough to motivate the implementation! If someone makes V + εW, the Differential IPerturbed implementation will catch it. Otherwise we need a separate impl for vectors, etc.)

  2. Anything that we allow into the coefficient slots of Differential needs all of its protocols implemented too, so that a function receiving a V + εW can do vector-ish things and not know it's received a Differential.

The second point is why I think d/bundle-element has to become generic. We do want users to be able to attempt to form V + εW, and we want to catch that and aggressively push Differential into the elements.

I DID implement IFn for Differential, because I thought it would make it easier to implement the Manzyuk et al. 2019 suggestion for how to allow the domain of (D f) to include functions (ie g is a function in ((D f) g)), but lately I think that this is bad style too and we should rip out the whole arity, IFn etc mess and always use a wrapping function.

I can explain all that next time we talk!

;; that we'll use later:

(defprotocol IPerturbed
(perturbed? [this]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure I will find out when I look ahead, but I note now just because I'm thinking about it that in principle one could omit this method and instead have replace-tag do nothing and extract-tangent return 0 in the case that nothing further is known

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wanted to ask you about this!

What you suggest is indeed the default implementation of extract-tangent and replace-tag in IPerturbed. The only place perturbed? is ever used is in sicmutils.abstract.function/literal-apply, when we're detecting whether or not we're to take a derivative.

Before my changes here you performed this check with sicmutils.calculus.derivative/differential?

GJS checks for differential? in literal-apply in litfun.scm, but his differential?only checks for specific instances of the Differential type (implementation here).

I haven't stared at the literal function implementation yet; but it's worth documenting this more clearly! But that was the background for this method.

;; `(g/* x y)` will return `y` if `(v/one? x)` is true... but to propagate the
;; derivative through we need this multiplication to occur. The compromise is:
;;
;; - [[v/one?]] and [[v/zero?]] return true only when ALL [[tangent-part]]s are
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This almost makes me prefer the names unity and nullity again (almost), since as you point out they're used to avoid operations like * (now I wonder if it's worth it). Nobody said computer algebra was easy

(defbinary g/add d:+)

(defunary g/negate
(lift-1 g/negate (fn [_] -1)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You like constantly so I'm surprised you didn't use it here :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening to me!!

@sritchie sritchie deleted the sritchie/diff_four branch February 3, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Automatic Differentiation Bugs and issues related to Automatic Differentiation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0