-
-
Notifications
You must be signed in to change notification settings - Fork 69
New, extensible Differential imp 8000 lementation (autodiff #4) #221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## sritchie/compare_fixes #221 +/- ##
=========================================================
Coverage ? 83.96%
=========================================================
Files ? 80
Lines ? 8470
Branches ? 456
=========================================================
Hits ? 7112
Misses ? 902
Partials ? 456 Continue to review full report at Codecov.
|
@littleredcomputer I've exported this PR as a literate document, if you prefer reading it that way: https://samritchie.io/dual-numbers-and-automatic-differentiation/ |
Or, what is described as "Alexey's Amazing Bug" in the Scmutils source | ||
code (and further described by Oleksandr Manzyuk [here][OM]). This should | ||
be straightforward to fix but hasn't been necessary for the work so far. | ||
Most of the code in the early sections of FDG will work, but this cannot be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're getting there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\o/
;; The [[IPerturbed]] protocol accomplishes this, along with two other functions | ||
;; that we'll use later: | ||
|
||
(defprotocol IPerturbed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@littleredcomputer: this is going to make it easy to extend D
to be able to handle fns that returns java lists, js objects and arrays, quaternions or any other container someone wants to handle.
;; If you want to compare two instances using their full term lists, | ||
;; See [[eq]]. | ||
#?(:clj (equals [a b] (equiv a b))) | ||
#?(:cljs (valueOf [this] (.valueOf (finite-term this)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a big discovery for me, and triggered #236. This is all we need in CLJS for differentials to compare properly with non-differentials, using <
, >
and friends.
…ing (#241) * tidy up tex rendering * changelog
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is monumental
Or, what is described as "Alexey's Amazing Bug" in the Scmutils source | ||
code (and further described by Oleksandr Manzyuk [here][OM]). This should | ||
be straightforward to fix but hasn't been necessary for the work so far. | ||
Most of the code in the early sections of FDG will work, but this cannot be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\o/
;; by returning `(mapv extract-tangent v)` | ||
;; | ||
;; The way that [[Differential]] instances interact with container types in | ||
;; SICMUtils makes it easy for these captures to occur all over. Whenever we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only use of the word "capture" in this (lovely) essay so I don't think it's obvious why this is good. What is wrong with having V + εW where V and W are vectors we may wonder? Could it be argued that that would make extract trivial and we wouldn't need IPerturbed? Earlier you do define dual numbers as having real components though and so that could be reason enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm glad you called this out. two thoughts this triggers:
-
There's nothing wrong with what you're suggesting in principle, especially if we added a clear directional derivative operator (you'd kick the whole process off with
V + εW
). BUT you still can't stop someone from taking a differential like5 + ε1
and putting it inside a collection! So you still need to handle both. (Concretely I'm going to remove the paragraph you're flagging, since the example above is enough to motivate the implementation! If someone makesV + εW
, theDifferential
IPerturbed
implementation will catch it. Otherwise we need a separate impl for vectors, etc.) -
Anything that we allow into the coefficient slots of
Differential
needs all of its protocols implemented too, so that a function receiving aV + εW
can do vector-ish things and not know it's received aDifferential
.
The second point is why I think d/bundle-element
has to become generic. We do want users to be able to attempt to form V + εW
, and we want to catch that and aggressively push Differential
into the elements.
I DID implement IFn
for Differential
, because I thought it would make it easier to implement the Manzyuk et al. 2019 suggestion for how to allow the domain of (D f)
to include functions (ie g
is a function in ((D f) g)
), but lately I think that this is bad style too and we should rip out the whole arity, IFn etc mess and always use a wrapping function.
I can explain all that next time we talk!
;; that we'll use later: | ||
|
||
(defprotocol IPerturbed | ||
(perturbed? [this] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sure I will find out when I look ahead, but I note now just because I'm thinking about it that in principle one could omit this method and instead have replace-tag do nothing and extract-tangent return 0 in the case that nothing further is known
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I wanted to ask you about this!
What you suggest is indeed the default implementation of extract-tangent
and replace-tag
in IPerturbed
. The only place perturbed?
is ever used is in sicmutils.abstract.function/literal-apply
, when we're detecting whether or not we're to take a derivative.
Before my changes here you performed this check with sicmutils.calculus.derivative/differential?
GJS checks for differential?
in literal-apply
in litfun.scm
, but his differential?
only checks for specific instances of the Differential
type (implementation here).
I haven't stared at the literal function implementation yet; but it's worth documenting this more clearly! But that was the background for this method.
;; `(g/* x y)` will return `y` if `(v/one? x)` is true... but to propagate the | ||
;; derivative through we need this multiplication to occur. The compromise is: | ||
;; | ||
;; - [[v/one?]] and [[v/zero?]] return true only when ALL [[tangent-part]]s are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This almost makes me prefer the names unity and nullity again (almost), since as you point out they're used to avoid operations like * (now I wonder if it's worth it). Nobody said computer algebra was easy
(defbinary g/add d:+) | ||
|
||
(defunary g/negate | ||
(lift-1 g/negate (fn [_] -1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You like constantly
so I'm surprised you didn't use it here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening to me!!
This new implementation of
Differential
is quite similar to the old, with some performance improvements born of lots of staring.I was also able to get the bug fix in that I mentioned to you, @littleredcomputer - I describe this at length in the tests for #223 , but this was a newer case of perturbation confusion that's handled in the
extract-tangent
implementation here.The protocol that this PR adds sets us up for the "Amazing Bug" fix in #223.
Additional notes from the CHANGELOG:
New, literate
Differential
implementation lives at atsicmutils.differential
(New, extensible Differential implementation (autodiff #4) #221) (see thispage for a
readable version.) Notable changes to the original impl at
sicmutils.calculus.derivative
include:We've changed our terminology dlefrom GJS's
finite-part
,infinitesimal-part
,make-x+dx
to the more modernprimal-part
,tangent-part
,bundle-element
that the Automatic Differentiation community hasnow adopted.
A new
sicmutils.differential.IPerturbed
protocol makes it possible toextend the Automatic Differentiation (AD) system to be able to handle
different Functor-shaped return values, like Java or JS lists and objects.
See the cljdoc page on Automatic
Differentiation
for more detail.
sicmutils.differential/{lift-1,lift-2,lift-n}
allow you to make customoperations differentiable, provided you can supply a derivative.
Differential
implementssicmutils.function/arity
,IFn
, and can beapplied to arguments if its coefficients are function values.
New
compare
andequiv
implementations allowDifferential
instances tocompare themselves with other objects using only their primal parts; this
makes it possible to use functions like
<=
,>
,=
to do control flowduring automatic differentiation. (Use
compare-full
andeq
if you wantto do full equality comparisons on primal and tangent components.)
related,
g/abs
is now implemented forDifferential
instances, makingthis function available in functions passed to
D
.proper
numerical?
,one?
andidentity?
implementations. The latter twoonly respond
true
if there are NO tangent components; This means thatone?
and(= % 1)
will not agree.The new implementation fixes a subtle bug with nested, higher order
automatic differentiation - it's too subtle for the CHANGELOG, so please the
"amazing" bug sections in
sicmutils.calculus.derivative-test
for properexposition.
sicmutils.generic/partial-derivative
gains aKeyword
extension, so it canrespond properly to
:name
and:arity
calls (New, extensible Differential implementation (autodiff #4) #221).Notes
In theory,
lift-1
,lift-2
andlift-n
are general enough that you could use the whole package to augment ANY arithmetic, not just the sicmutils arithmetic, with forward mode AD. In reality, I realize that we've baked in reliance ong/+
andg/*
to the implementations ofDifferential
addition and multiplication. If we let the user bind those, or pass those in tolift-1
and friends and threaded them through, you could augment just the core functions, for example, without relying on the whole generic tower we've built here.Some of the resources I used, studying up for this implementation:
lift-1
and friends recurse on differential inputs, so that the lifted function didn't have to ALREADY be lifted (ie be the same generic you're extending!)IPerturbed
protocol makes it possible to fix the "Amazing Bug" reported in Manzyuk et al. 2019 that GJS fixed in the 2019 release of scmutils. Here's a talk on that paper if you want a clearer walkthrough... or look at the detailed tests in Switch derivative over to new Differential impl, solve "amazing bug" (autodiff #6) #223.The bug in the paper arises from the question, addressed in my exposition a bit - How do you extend
derivative
so that its range can include functions? IE, can I take the derivative of a higher-order function? This is a more complicated problem than allowing functors likeVector
orDict
into the range, since the function expects an input too.What if you wanted to take derivatives of fns that returned core.async channels?