Separate planning from optimization #1731

johnynek · 2017-10-05T03:22:27Z

This PR totally separates the planning portion from optimization.

When we are converting to a cascading Pipe, we just literally translate. Before we do that we apply a number of phases of optimizations.

We want to have more optimizations in the future, but right now the priority is getting tests to pass. Some of our tests were actually relying on existing optimizations, and certainly performance will regress if we don't do the optimizations we used to do.

In a future PR, I want to make the optimizations more configurable, which will also make testing easier if we can run a TypedPipe without doing any optimizations, this will allow us to make sure the optimizations never change logic of the job.

cc @ianoc @non

ianoc · 2017-10-05T15:49:19Z

scalding-core/src/main/scala/com/twitter/scalding/typed/OptimizationRules.scala

+   * a.map(f).flatMap(g) == a.flatMap { x => g(f(x)) }
+   * a.flatMap(f).map(g) == a.flatMap { x => f(x).map(g) }
+   *
+   * This is a rule you may want to apply after having


In Dagon rule optimization does it support this notion? I.e. do you run each rule to stability or do you run all the rules in a loop?

Rules don’t, but you can apply a sequence of rules to do this (although, I forgot to use that API in this PR. Using it will clean it up a bit)

ianoc · 2017-10-05T15:50:46Z

scalding-core/src/main/scala/com/twitter/scalding/typed/OptimizationRules.scala

+  object ComposeMapFlatMap extends PartialRule[TypedPipe] {
+    def applyWhere[T](on: Dag[TypedPipe]) = {
+      case FlatMapped(Mapped(in, f), g) =>
+        FlatMapped(in, FlatMappedFn(g).runAfter(FlatMapping.Map(f)))


ooc how type safe are these rule writing's? if you swapped f and g here would this still compile?

It’s type-safe. It would not compile if you swapped the functions since the result type wouldn’t be a TypedPipe[T]

That’s the exciting part of Dagon!

ianoc · 2017-10-05T15:53:51Z

scalding-core/src/main/scala/com/twitter/scalding/typed/OptimizationRules.scala

+  }
+
+  /**
+   * In scalding 0.17 and earlier, descriptions were automatically pushdown below


For this one, i think we need to do this to enable function coalescing in Dagon/scalding right?

Yeah. That’s why I added it. Actually, traps seem to require function coalescing due to the strange way they work. Traps are not totally safe currently it seems and can still throw runtime exceptions if the user confuses cascading.

I think we can make them safe with optimization rules, but they are still confusing since the trap only reaches down to the next write barrier which is not always obvious in the graph. I’m not really a fan of traps by the way.

Ha i remember you not being a huge fan of them when they were added. I believe we could implement them in scalding land instead at some point -- and actually just make them work on spark too in the same way ish? some sort of withSideEffectingTry() which is a mix of a flatmap and a try + catch. Dump into a file the bad records. But i looked through the comments related to traps in this pr and it doesn't seem like we have particularly great semantics now and its likely buggy. So thats all just sad. But out of scope for this change to make traps great again.

ianoc · 2017-10-05T16:13:54Z

...ding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/CascadingBackend.scala

+      // TODO, this may be identity if the setter is the inverse of the
+      // converter. If we can identify this we will save allocations
+      val resFd = new RichFlowDef(fd)
+      resFd.mergeFrom(localFlowDef)


merge from doesn't mutate localFlowDef i imagine?

that's right. We have been using that for a while. It mutates the left, but not the right.

ianoc · 2017-10-05T17:32:16Z

...ding-core/src/main/scala/com/twitter/scalding/typed/cascading_backend/CascadingBackend.scala

+            rec(FlatMapped(input, FlatMappedFn.fromMap(fn)))
+
+          case (m@MergedTypedPipe(_, _), rec) =>
+            OptimizationRules.unrollMerge(m) match {


This seems odd here, maybe a comment as to why its not just in the optimizer flow and we need it here too?

the reason is Merge has arity 2, but cascading merge can take a list. I don't think cascading is happy if you pass it a merge with no items, and I didn't want to do one with a single merge, but that maybe is a call.

So, we really have to unroll or we will make a giant linked list of Merge nodes, so, this is a case where some transformation is needed.

Or did you just mean the special casing for 0 and 1 item? Again, I think that is so cascading won't blow up.

The other optimizations should definitely be in a rule (x ++ x == x.flatMap { y => List(y, y) } for instance).

So it might be leaky but i'd have thought that maybe in an ideal we would be able to have the invariant for this operator that if we see a MergedTypedPipe post optimization it should always just be 2 pipes. But it does mean we would need to introduce a new node type of a 'many pipe' merge. Which is maybe not worth it then?

ianoc · 2017-10-05T17:40:06Z

Obviously a huge change, so its kind of hard to be sure its all right. But scalding has pretty extensive tests so between that and the read I did i think this is a 👍

Good to get more convergence into the one flow. The separation continues to look better and better. Nice work!

johnynek · 2017-10-07T17:35:40Z

@ianoc I had to make a couple of minor changes to make the tests pass.

was a line number check was testing not only for line numbers, but it was testing it was applied to the very last pipe, which is a very strong contract to enforce.
I went ahead and moved the force-before-hashjoin rule into the optimizer, since without it, the tests didn't pass (since the old rules were a bit brittle and tied to how we planned to cascading (they also get clearer if you ask me)
there was a bug in filter composition, rather than f(x) && g(x) I was doing f(x) && f(x) which typechecks, but is of course wrong.

Can I merge? I am eager to keep working on the board:
https://github.com/twitter/scalding/projects/1

johnynek · 2017-10-09T02:42:47Z

I'm going to take the 👍 given here: #1731 (comment) as a shipit.

I have another PR in the pipe that I want to stage. Any concerns I'll fix in a subsequent PR.

johnynek added 16 commits September 22, 2017 23:32

Implement Dagon.toLiteral

52eae5e

reduce stack depth

b86a42a

merge with master, sort TypedPipe cases

8cc51ea

Add generic TypedPipe optimization rules

6e8029f

fix compilation error, add a few more rules

13b0b79

fix serialization issue with 2.12

4e12145

merge with master

c8bd451

Add tests of correctness to optimization rules

f0488b0

add comments, improve some rules

70f5bca

checkpoint

f1851bb

fix bug with outerjoin

4ae54cf

Merge branch 'oscar/use-dagon-2' into oscar/use-dagon-3

af93e5b

Cut over the the compiler approach

cd6894d

add a comment

f13436d

merge with develop

447580e

Use optimization rules to get the tests to pass

bc2ce20

johnynek assigned ianoc Oct 5, 2017

ianoc reviewed Oct 5, 2017

View reviewed changes

johnynek added 3 commits October 5, 2017 20:12

fixes to make the tests pass

11ff338

update comment about dagon post 0.2.2

a39df55

fix a bug in the filter composition rule, go tests\!

c25d8e5

johnynek mentioned this pull request Oct 6, 2017

remove use of anonymous function literals in TypedPipe/Grouped #1734

Open

johnynek merged commit bdd5dcc into develop Oct 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Separate planning from optimization #1731

Separate planning from optimization #1731

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Separate planning from optimization #1731

Separate planning from optimization #1731

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!