8000 Consider switching to the Jawn parser on JS · Issue #1941 · circe/circe · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Consider switching to the Jawn parser on JS #1941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
armanbilge opened this issue Apr 8, 2022 · 8 comments
Closed

Consider switching to the Jawn parser on JS #1941

armanbilge opened this issue Apr 8, 2022 · 8 comments

Comments

@armanbilge
Copy link
Contributor

tl;dr The circe-jawn parser on JS has better semantics and in many cases out-performs the JSON.parse method currently used by circe-parser.

Benchmarks: https://armanbilge.github.io/jsoniter-scala/index-scalajs.html

Note that the above benchmark results were extracted from https://plokhotnyuk.github.io/jsoniter-scala/index-scalajs.html; all credit goes to @plokhotnyuk.


In #1791, the circe-jawn module was cross-built for JS. However, the circe-parser module continued to use the standard JS API JSON.parse for parsing and this was "unlikely to change" according to typelevel/jawn#351 (comment). Presumably this was for performance-related reasons:

  1. JSON.parse is provided by the runtime, so no parsing code has to be bundled in applications' size-sensitive JS artifacts.
  2. Runtimes are free to implement JSON.parse with highly-optimized native code, so it should be fast.

However, as noted in the circe docs, the use of JSON.parse also has some serious caveats:

## Warnings and known issues
When using the Scala.js version of circe, numerical values like `Long` may [lose
precision][#393] when decoded. For example `decode[Long]("767946224062369796")`
will return `Right(767946224062369792L)`. This is not a limitation of how
Scala.js represents `scala.Long`s nor circe's decoders for numerical values but
due to [`JSON.parse`][json.parse] converting numerical values to JavaScript
numbers. If precision is required consider representing numerical values as
strings and convert them to their final value via the JSON AST.

This (often surprising) difference in semantics can cause problems such as #393 and #1911 (comment) and has "raised eyebrows".

I'm surprised circe considers it better to be fast than correct, but it's their call. Longs themselves are correct in Scala.js. Apparently it's not correct to serialize Longs using the default mechanism of circe if you actually use their full range. Instead, using strings or a pair of Ints may work.

@sjrd in https://discord.com/channels/632150470000902164/635668814956068864/905900410517204992

Note that the circe-jawn.js parser does not have these issues and instead has semantics that match the JVM, since it is parsing from String to the Json AST with exactly the same code.

With much gratitude to @plokhotnyuk who maintains comprehensive browser benchmarks for Scala.js JSON libraries, we now have concrete numbers comparing circe-jawn.js to circe-parser.js. I've trimmed down @plokhotnyuk's webpage to just the relevant benchmarks for this comparison. For easiest viewing I recommend selecting a specific browser to focus on.

https://armanbilge.github.io/jsoniter-scala/index-scalajs.html

It's also possible to run the benchmarks yourself at:
https://plokhotnyuk.github.io/jsoniter-scala/scala-2.13-fullopt.html

Here's my rough summary/analysis from looking through the Chrome results, but please draw your own conclusions :)

  • Jawn.js is overall very competitive with JSON.parse
  • Jawn.js consistently out-performed JSON.parse when parsing API responses (GH, Twitter, Google Maps)
  • In some benchmarks, Jawn.js is up to 4x faster than JSON.parse
  • Jawn.js was up to 5x slower when parsing certain numerics (BigDecimal, Double, Float) ... but this is precisely the situation in which JSON.parse may return incorrect results, so it's not really a fair comparison

Besides raw performance, I had previously investigated how circe-jawn affects the size of the JS artifact. In http4s/http4s-dom#10 (comment) I estimated it contributes roughly 15 KB (fullOptJS+gcc, pre-gzip). I don't think this is a big deal, and definitely not in the Typelevel-stack Scala.js apps I've seen 😆


In summary, I think the circe-parser module should switch to use the Jawn parser on JS (although maybe not until the next breaking release). This gets us:

  1. Semantics that match the JVM
  2. No gotchas/surprises around parsing of numerics
  3. Competitive or improved performance in most situations

Of course, the JSON.parse-based parser should continue to be provided in the circe-scalajs module. Users who are specifically concerned about artifact size and/or performance of numerics parsing and willing to accept the shortcomings of JSON.parse, can use this parser instead.

Thanks for reading :)

@isomarcte
Copy link
Contributor

I'm in favor of switching to Jawn as the default parser on 0.15.x.

@armanbilge
Copy link
Contributor Author

There's an argument to make this change in the 0.14.x line as a bug fix.

@zarthross
Copy link
Member

This does feel like the right move, but there are areas where performance of JSON.parse is better besides just numbers. For instance StringOfAsciiCharsReading, StringOfEscapedCharsReading, ArrayOfUUIDsReading, ArrayOfEnumADTsReading and ArrayOfCharsReading where JSON.parse is more performant. StringOfEscapedCharsReading being 3.6x faster. Moreover the difference in performance on BigDecimalReading is 6.5 in circe vs circe-jawn... thats not trivial for some of our users.

I'm in favor of switching in principle, but wonder what we can do to bring the jawn more in parity in these cases before switching.

I want to try avoiding as best I can anybody coming out to complain about a sudden performance change they didn't opt into 😅 .

@armanbilge
Copy link
Contributor Author
armanbilge commented Jul 13, 2022

@zarthross thanks for taking a look into this!

thats not trivial for some of our users.

Genuine question: do you have some specific users in mind? The reason I ask is, although I 100% agree that in some of these microbenchmarks Jawn definitely leaves something to be desired, it also makes improvements in a number of areas.

For example ADTReading, ArrayOfIntsReading, ArrayOfBigIntsReading, ArrayOfLongsReading, StringOfAsciiCharsReading, StringOfNonAsciiCharsReading. Hell, even vanilla IntReading is faster on Jawn. Whose to say whether these are more or less important than the areas in which there are performance regressions? It's tradeoffs either way, but at least Jawn is correct.

Big picture, looking at API responses (rather than microbenchmarks of very specific things) Jawn seems to be more performant than JSON.parse. At the end of the day, we are talking about JSON parsing primarily in browsers where some of these performance concerns may actually not be important at all for the majority of users.

I want to try avoiding as best I can anybody coming out to complain about a sudden performance change they didn't opt into

Fair :) I mean, every change has the possibility to upset somebody 😆 the nice thing about this change, is that anyone who is unhappy can just swap to circe-scalajs module and continue using JSON.parse. But in that case, they are opting in to the caveats of that parser. Meanwhile, we are fixing parsing bugs, for like, everyone else.

@isomarcte
Copy link
Contributor

@zarthross we could write a parser which uses Jawn for some cases and JSON.parse for others...(probably an insane idea? (maybe not?))

@zarthross
Copy link
Member

@armanbilge I think its the right move for 0.15. I'll think about it for 0.14.x and talk with @zmccoy. I wish we had some benchmarks for nodejs as well to compare and make sure that meshes with what we are seeing with browsers.

As for your question about specific users, no... I don't know any specific user which would have an issue.

@isomarcte It would be better if we could do some WASM magic or something to hyperoptimize our js json parser 😆

@armanbilge
Copy link
Contributor Author

Thanks.

I wish we had some benchmarks for nodejs as well to compare and make sure that meshes with what we are seeing with browsers.

I'll see if I can look into this. Note that Chrome and Node.js both use the same V8 Javascript runtime, so I would expect the Chrome results to be representative for Node.js.

@armanbilge
Copy link
Contributor Author

Done in #2042.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0