Consider switching to the Jawn parser on JS

@plokhotnyuk

tl;dr The circe-jawn parser on JS has better semantics and in many cases out-performs the JSON.parse method currently used by circe-parser.

Benchmarks: https://armanbilge.github.io/jsoniter-scala/index-scalajs.html

Note that the above benchmark results were extracted from https://plokhotnyuk.github.io/jsoniter-scala/index-scalajs.html; all credit goes to @plokhotnyuk.

In #1791, the circe-jawn module was cross-built for JS. However, the circe-parser module continued to use the standard JS API JSON.parse for parsing and this was "unlikely to change" according to typelevel/jawn#351 (comment). Presumably this was for performance-related reasons:

JSON.parse is provided by the runtime, so no parsing code has to be bundled in applications' size-sensitive JS artifacts.
Runtimes are free to implement JSON.parse with highly-optimized native code, so it should be fast.

However, as noted in the circe docs, the use of JSON.parse also has some serious caveats:

circe/docs/src/main/tut/parsing.md

Lines 61 to 69 in a015255

    
           ## Warnings and known issues 
        
           When using the Scala.js version of circe, numerical values like `Long` may [lose 
        
           precision][#393] when decoded. For example `decode[Long]("767946224062369796")` 
        
           will return `Right(767946224062369792L)`. This is not a limitation of how 
        
           Scala.js represents `scala.Long`s nor circe's decoders for numerical values but 
        
           due to [`JSON.parse`][json.parse] converting numerical values to JavaScript 
        
           numbers. If precision is required consider representing numerical values as 
        
           strings and convert them to their final value via the JSON AST.

This (often surprising) difference in semantics can cause problems such as #393 and #1911 (comment) and has "raised eyebrows".

I'm surprised circe considers it better to be fast than correct, but it's their call. Longs themselves are correct in Scala.js. Apparently it's not correct to serialize Longs using the default mechanism of circe if you actually use their full range. Instead, using strings or a pair of Ints may work.

@sjrd in https://discord.com/channels/632150470000902164/635668814956068864/905900410517204992

Note that the circe-jawn.js parser does not have these issues and instead has semantics that match the JVM, since it is parsing from String to the Json AST with exactly the same code.

With much gratitude to @plokhotnyuk who maintains comprehensive browser benchmarks for Scala.js JSON libraries, we now have concrete numbers comparing circe-jawn.js to circe-parser.js. I've trimmed down @plokhotnyuk's webpage to just the relevant benchmarks for this comparison. For easiest viewing I recommend selecting a specific browser to focus on.

https://armanbilge.github.io/jsoniter-scala/index-scalajs.html

It's also possible to run the benchmarks yourself at:
https://plokhotnyuk.github.io/jsoniter-scala/scala-2.13-fullopt.html

Here's my rough summary/analysis from looking through the Chrome results, but please draw your own conclusions :)

Jawn.js is overall very competitive with JSON.parse
Jawn.js consistently out-performed JSON.parse when parsing API responses (GH, Twitter, Google Maps)
In some benchmarks, Jawn.js is up to 4x faster than JSON.parse
Jawn.js was up to 5x slower when parsing certain numerics (BigDecimal, Double, Float) ... but this is precisely the situation in which JSON.parse may return incorrect results, so it's not really a fair comparison

Besides raw performance, I had previously investigated how circe-jawn affects the size of the JS artifact. In http4s/http4s-dom#10 (comment) I estimated it contributes roughly 15 KB (fullOptJS+gcc, pre-gzip). I don't think this is a big deal, and definitely not in the Typelevel-stack Scala.js apps I've seen 😆

In summary, I think the circe-parser module should switch to use the Jawn parser on JS (although maybe not until the next breaking release). This gets us:

Semantics that match the JVM
No gotchas/surprises around parsing of numerics
Competitive or improved performance in most situations

Of course, the JSON.parse-based parser should continue to be provided in the circe-scalajs module. Users who are specifically concerned about artifact size and/or performance of numerics parsing and willing to accept the shortcomings of JSON.parse, can use this parser instead.

Thanks for reading :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	## Warnings and known issues

	When using the Scala.js version of circe, numerical values like `Long` may [lose
	precision][#393] when decoded. For example `decode[Long]("767946224062369796")`
	will return `Right(767946224062369792L)`. This is not a limitation of how
	Scala.js represents `scala.Long`s nor circe's decoders for numerical values but
	due to [`JSON.parse`][json.parse] converting numerical values to JavaScript
	numbers. If precision is required consider representing numerical values as
	strings and convert them to their final value via the JSON AST.

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions