8000 GitHub - IAmRasputin/jzon: A correct and safe JSON parser.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
forked from Zulu-Inuoe/jzon

A correct and safe JSON parser.

License

Notifications You must be signed in to change notification settings

IAmRasputin/jzon

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jzon

A correct and safe(er) JSON RFC 8259 parser with batteries included.

Actions Status

Table of Contents

Type Mappings

jzon cannonically maps types per the following chart:

JSON CL
true symbol t
false symbol nil
null symbol null
number integer or double-float
string simple-string
array simple-vector
object hash-table (equal)

Note the usage of symbol cl:null as a sentinel for JSON null

These are the values returned by the reading functions, though when writing, other values are supported.

Reading

jzon:parse will parse JSON and produce a CL value:

(jzon:parse "
{
  \"name\": \"Rock\",
  \"coords\": {
    \"x\": 5,
    \"y\": 7
  },
  \"attributes\": [\"fast\", \"hot\"],
  \"physics\": true,
  \"item\": false,
  \"parent\": null
}")
; => #<HASH-TABLE :TEST EQUAL :COUNT 6 {1003ECDC93}>

(defparameter *ht* *)

(string= (gethash *ht* "name") "Rock")                  ; => t
(= (gethash (gethash *ht* "coords") "x") 5)             ; => t
(= (gethash (gethash *ht* "coords") "y") 7)             ; => t
(equalp (gethash *ht* "attributes") #("fast" "hot"))    ; => t
(eq (gethash *ht* "physics") "t")                       ; => t
(eq (gethash *ht* "item") nil)                          ; => t
(eq (gethash *ht* "parent") 'null)                      ; => t

jzon:parse in reads input from in and returns a parsed value per Type Mappings.

in can be any of the following:

  • string
  • (vector (unsigned-byte 8)) - octets in utf-8
  • stream - character or binary in utf-8
  • pathname - jzon:parse will open the file for reading in utf-8

jzon:parse also accepts the follwing keyword arguments:

  • :max-depth controls the maximum depth allowed when nesting arrays or objects.
  • :allow-comments controls if we allow single-line // comments and /**/ multiline block comments.
  • :allow-trailing-comma controls if we allow a single comma , after all elements of an array or object.
  • :max-string-length controls the maximum length allowed when reading a string key or value.
  • :key-fn is a function of one value which 'pools' object keys, or null for the default pool.

Tip: key-fn can be supplied as #'identity in order to disable key pooling:

(jzon:parse "[ { \"x\": 1, \"y\": 2 }, { \"x\": 3, \"y\": 4 } ]" :key-fn #'identity)

(defparameter *v* *)

(gethash "x" (aref *v* 0)) ; => 1
(gethash "y" (aref *v* 0)) ; => 2

This may speed up parsing on certain JSON as we do not have to build the string lookup table.

Tip: alexandria:make-keyword or equivalent can be used to make object keys into symbols:

(jzon:parse "[ { \"x\": 1, \"y\": 2 }, { \"x\": 3, \"y\": 4 } ]" :key-fn #'alexandria:make-keyword)

(defparameter *v* *)

(gethash :|x| (aref *v* 0)) ; => 1
(gethash :|y| (aref *v* 0)) ; => 2

.. or if you want to follow as default CL does, you can string-upcase before interning:

(jzon:parse "[ { \"x\": 1, \"y\": 2 }, { \"x\": 3, \"y\": 4 } ]" :key-fn (lambda (k) (alexandria:make-keyword (string-upcase k))))

(defparameter *v* *)

(gethash :x (aref *v* 0)) ; => 1
(gethash :y (aref *v* 0)) ; => 2

Incremental Parser

In addition to jzon:parse, jzon:with-parser exposes an incremental parser for reading JSON in parts:

(jzon:with-parser (parser "{\"x\": 1, \"y\": [2, 3], \"live\": false}")
  (jzon:parse-next parser)  ; => :begin-object
  (jzon:parse-next parser)  ; => :object-key, "x"
  (jzon:parse-next parser)  ; => :value, 1
  (jzon:parse-next parser)  ; => :object-key, "y"
  (jzon:parse-next parser)  ; => :begin-array
  (jzon:parse-next parser)  ; => :value, 2
  (jzon:parse-next parser)  ; => :value, 3
  (jzon:parse-next parser)  ; => :end-array
  (jzon:parse-next parser)  ; => :object-key, "live"
  (jzon:parse-next parser)  ; => :value, nil
  (jzon:parse-next parser)  ; => :end-object
  (jzon:parse-next parser)) ; => nil

both jzon:with-parser and jzon:make-parser receive the same arguments as jzon:parse.

Note: jzon:make-parser is akin to cl:open and jzon:close-parser is akin to cl:close. Prefer jzon:with-parser when you do not need indefinite extent for the parser.

The relevant functions for the incremental parser are:

jzon:make-parser in - Construct a parser from in, which may be any of the inputs applicable to jzon:parse

jzon:parse-next parser - Parse the next token from parser. Returns two values, depending on the token:

  • :value, <value> - The parser encountered a json-atom, <value> is the value of the atom
  • :begin-object, nil - The parser encountered an object opening
  • :object-key, <key> - The parser encountered an object key, <key> is that key
  • :close-object, nil - The parser encountered an object closing
  • :begin-array, nil - The parser encountered an array opening
  • :close-array, nil - The parser encountered an array closing
  • nil - The parser is complete

jzon:close-parser parser - Close a parser, closing any opened streams and allocated objects

Incremental Parser Example

jzon:parse could be approximately defined as follows:

(defun my/jzon-parse (in)
  (jzon:with-parser (parser in)
    (let (top stack key)
      (flet ((finish-value (value)
                (typecase stack
                  (null                 (setf top value))
                  ((cons list)          (push value (car stack)))
                  ((cons hash-table)    (setf (gethash (pop key) (car stack)) value)))))
        (loop
          (multiple-value-bind (evt value) (jzon:parse-next parser)
            (ecase evt
              ((nil)          (return top))
              (:value         (finish-value value))
              (:begin-array   (push (list) stack))
              (:end-array     (finish-value (coerce (the list (nreverse (pop stack))) 'simple-vector)))
              (:begin-object  (push (make-hash-table :test 'equal) stack))
              (:object-key    (push value key))
              (:end-object    (finish-value (pop stack))))))))))

Writing

jzon:stringify will serialize an object to JSON:

(jzon:stringify #("Hello, world!" 5 2.2 #(null)) :stream t :pretty t)
; =>
[
  "Hello, world!",
  5,
  2.2,
  [
    null
  ]
]

jzon:stringify accepts the following keyword arguments:

  • :stream A destination like in format, or a pathname. Like format, returns a string if nil.
  • :pretty If true, output pretty-formatted JSON.
  • :coerce-key A function for coercing key values to strings. See Custom Serialization
  • :replacer A function which takes a key and value as an argument, and returns t or nil, indicating whether the KV pair should be written.
    • Optionally returns a second value, indicating the value to be stringified in place of the given value.

jzon:stringify will make use of the jzon:write-value generic function, so in addition to Type Mappings, jzon:stringify accepts the following types of values:

CL JSON
symbol string (symbol-name), but see Symbol key case
pathname string (uiop:native-namestring)
real number
alist* object
plist* object
cons† array
list array
sequence array
standard-object object
structure-object‡ object

*: Heuristic depending on the key values - Detects alists/plists by testing each key to be a character, string, or symbol.

†: Heuristic based on a cons with a non-list cdr - (jzon:stringify (cons 1 2)) => [1,2].

‡: On supported implementations where structure slots are available via the MOP.

Please see Custom Serialization for more details.

Symbol key case

When symbols are used as keys in objects, their names will be downcased, unless they contain mixed-case characters.

For example:

(let ((ht (make-hash-table :test 'equal)))
  (setf (gethash 'all-upper ht) 0)
  (setf (gethash '|mixedCase| ht) 0)
  (setf (gethash "ALL UPPER" ht) 0)

  (jzon:stringify ht :pretty t :stream t))

result:

{
  "all-upper": 0,
  "mixedCase": 0,
  "ALL UPPER": 0
}

This is particularly important when serializing CLOS objects per Custom Serialization.

Incremental Writer

In addition to jzon:stringify, jzon:make-writer exposes an incremental writer for writing JSON in parts.

(jzon:with-writer* (:stream *standard-output* :pretty t)
  (jzon:with-object*
    (jzon:write-properties* :age 24 "colour" "blue")
    (jzon:write-key* 42)
    (jzon:write-value* #(1 2 3))

    (jzon:write-key* "an-array")
    (jzon:with-array*
      (jzon:write-values* :these :are :array :elements))

    (jzon:write-key* "another array")
    (jzon:write-array* :or "you" "can" "use these" "helpers")))
; =>
{
  "age": 24,
  "colour": "blue",
  "42": [
    1,
    2,
    3
  ],
  "an-array": [
    "THESE",
    "ARE",
    "ARRAY",
    "ELEMENTS"
  ],
  "another array": [
    "OR",
    "you",
    "can",
    "use these",
    "helpers"
  ]
}

jzon:make-writer and jzon:with-writer* accept the same arguments as jzon:stringify, except :stream must be an open cl:stream.

Note all writer functions have *-suffixed variants which use the jzon:*writer* variable and omit the first writer parameter.

eg instead of

(let ((writer (jzon:make-writer)))
  (write-value writer "foo"))

we can use

(jzon:with-writer* ()
  (write-value* "foo"))

Incremental Writer Functions

The relevant functions for the incremental writer are:

jzon:write-value writer value - Writes any value to the writer. Usable when writing a toplevel value, array element, or object property value.

(jzon:write-value* "Hello, world")

Note: This is a generic-function you can specialize your values on. See custom serialization for more information.

jzon:with-array writer - Open a block to begin writing array values.

(jzon:with-array*
  (jzon:write-value* 0)
  (jzon:write-value* 1)
  (jzon:write-value* 2))

jzon:begin-array writer - Begin writing an array

(jzon:begin-array*)
(jzon:write-value* 0)
(jzon:write-value* 1)
(jzon:write-value* 2)
(jzon:end-array*)

jzon:write-values writer &rest values* - Write several array values.

(jzon:with-array*
  (jzon:write-values* 0 1 2))

json:end-array writer - Finish writing an array.

(jzon:begin-array*)
(jzon:write-value* 0)
(jzon:write-value* 1)
(jzon:write-value* 2)
(jzon:end-array*)

jzon:write-array - Open a new array, write its values, and close it.

(jzon:write-array* 0 1 2)

jzon:with-object writer - Open a block where you can begin writing object properties.

(jzon:with-object*
  (jzon:write-property* "age" 42))

jzon:begin-object writer - Begin writing an object.

(jzon:begin-object*)
(jzon:write-property* "age" 42)
(jzon:end-object*)

jzon:write-key writer key - Write an object key.

(jzon:with-object*
  (jzon:write-key* "age")
  (jzon:write-value* 42))

json:write-property writer key value - Write an object key and value.

(jzon:with-object*
  (jzon:write-property* "age" 42))

jzon:write-properties writer &rest key* value* - Write several object keys and values.

(jzon:with-object*
  (jzon:write-properties* "age" 42
                          "colour" "blue"
                          "x" 0
                          "y" 10))

json:end-object writer - Finish writing an object.

(jzon:begin-object*)
(jzon:write-property* "age" 42)
(jzon:end-object*)

jzon:write-object writer &rest key* value* - Open a new object, write its keys and values, and close it.

(jzon:write-object* "age" 42
                    "colour" "blue"
                    "x" 0
                    "y" 10)

Incremental Writer Example

jzon:stringify could be approximately defined as follows:

(defun my/jzon-stringify (value)
  (labels ((recurse (value)
             (etypecase value
               (jzon:json-atom
                 (jzon:write-value* value))
               (vector
                 (jzon:with-array*
                   (map nil #'recurse value)))
               (hash-table
                 (jzon:with-object*
                   (maphash (lambda (k v)
                              (jzon:write-key* k)
                              (recurse v))
                            value))))))
    (with-output-to-string (s)
      (jzon:with-writer* (:stream s)
        (recurse value)))))

Tip: Every function returns the jzon:writer itself for usage with arrow macros:

(let ((writer (jzon:make-writer :stream *standard-output*)))
  (jzon:with-object writer
    (-> writer
        (jzon:write-key "key")
        (jzon:write-value "value")
        (jzon:begin-array)
        (jzon:write-value 1)
        (jzon:end-array))))`

Custom Serialization

When using either jzon:stringify or jzon:write-value, you can customize writing of any values not covered in the Type Mappings in an few different ways.

standard-object

By default, if your object is a standard-object, it will be serialized as a JSON object, using each of its bound slots as keys.

Consider the following classes:

(defclass coordinate ()
  ((reference
    :initarg :reference)
   (x
    :initform 0
    :initarg :x
    :accessor x)
   (y
    :initform 0
    :initarg :y
    :accessor y)))

(defclass object ()
  ((alive
    :initform nil
    :initarg :alive
    :type boolean)
   (coordinate
    :initform nil
    :initarg :coordinate
    :type (or null coordinate))
   (children
    :initform nil
    :initarg :children
    :type list)))

If we stringify a fresh coordinate object via (jzon:stringify (make-instance 'coordinate) :pretty t :stream t), we'd end up with:

{
  "x": 0,
  "y": 0
}

And if we (jzon:stringify (make-instance 'coordinate :reference "Earth") :pretty t :stream t):

{
  "reference": "Earth",
  "x": 0,
  "y": 0
}

Similarly if we (jzon:stringify (make-instance 'object) :pretty t :stream t):

{
  "alive": false,
  "coordinate": null,
  "children": []
}

Note that here we have nil representing false, null, and []. This is done by examining the :type of each slot. If no type is provided, nil shall serialize as null.

coerced-fields

jzon:coerced-fields is a generic function which calculates the JSON object key/value pairs when writing and is a simple way to add custom serialization for your values.

Consider our previous coordinate class. If we always wanted to serialize only the x and y slots, and wanted to rename them, we could specialize jzon:coerced-fields as follows:

(defmethod jzon:coerced-fields ((coordinate coordinate))
  (list (list "coord-x" (x coordinate))
        (list "coord-y" (y coordinate))))

This results in:

{
  "coord-x": 0,
  "coord-y": 0
}

jzon:coerced-fields should return a list of 'fields', which are two (or three) element lists of the form:

(name value &optional type)
  • name can be any suitable key name. In particular, integers are allowed coerced to their decimal string representation.
  • value can be any value - it'll be coerced if necessary.
  • type is used as :type above, in order to resolve ambiguities with nil.

Example: Including only some slots

If the default jzon:coerced-fields gives you most of what you want, you can exclude/rename/add fields by specializing an :around method as follows:

(defmethod jzon:coerced-fields :around ((coordinate coordinate))
  (let* (;; Grab default fields
         (fields (call-next-method))
         ;; All fields except "children"
         (fields (remove 'children fields :key #'first))
         ;; Include a 'fake' field "name"
         (fields (cons (list 'name "Mary") fields)))
    fields))

This would result in the following:

{
  "name": "Mary",
  "alive": false,
  "coordinate": {
    "x": 0,
    "y": 0
  }
}

write-value

For more fine-grained control, you can specialize a method on jzon:write-value.

This allows you to emit whatever value you wish for a given object.

jzon:write-value writer value

writer is a writer on which any of the writer functions may be called to serialize your object in any desired way.

(defclass my-point () ())

(defmethod jzon:write-value (writer (value my-point))
  (jzon:write-array writer 1 2))

See writer for the available functions.

Features

In writing jzon, we prioritize the following properties, in order:

Safety

RFC 8259 allows setting limits on things such as:

  • Number values accepted
  • Nesting level of arrays/objects
  • Length of strings

We should be safe in the face of untrusted JSON and will error on 'unreasonable' input out-of-the-box, such as deeply nested objects or overly long strings.

Type Safety

All of jzon's public API's are type safe, issuing cl:type-error as appropriate.

Some other JSON parsers will make dangerous use of features like optimize (safety 0) (speed 3) without type-checking their public API:

CL-USER> (parse 2)
; Debugger entered on #<SB-SYS:MEMORY-FAULT-ERROR {1003964833}>

Such errors are unreasonable.

Avoid Infinite Interning

jzon chooses to (by default) keep object keys as strings. Some libraries choose to intern object keys in some package. This is dangerous in the face of untrusted JSON, as every unique key read will be added to that package and never garbage collected.

Avoid Stack Exhaustion

jzon:parse is written in an iterative way which avoids exhausting the call stack. In addition, we provide :max-depth to guard against unreasonable inputs. For even more control, you can make use of the jzon:with-parser API's to avoid consing large amounts of user-supplied data to begin with.

Correctness

This parser is written against RFC 8259 and strives to adhere strictly for maximum compliance and few surprises.

It also has been tested against the JSONTestSuite. See the JSONTestSuite directory in this repo for making & running the tests.

In short, jzon is the only CL JSON library which correctly:

  • declines all invalid inputs per that suite
  • accepts all valid inputs per that suite

Additionally, jzon is one of a couple which never hard crash due to edge-cases like deeply nested objects/arrays.

Unambiguous values

Values are never ambiguous between [], false, {}, null, or a missing key.

Compatible Float IO

While more work is doubtlessly necessary to validate further, care has been taken to ensure floating-point values are not lost between (jzon:parse (jzon:stringify f)), even across CL implementations.

In particular, certain edge-case values such as subnormals shall parse === with JavaScript parsing libraries.

Simplicity

You call jzon:parse, and you get a reasonably standard CL object back. You call jzon:stringify with a reasonably standard CL object and you should get reasonable JSON.

  • No custom data structures or accessors required
  • No worrying about key case auto conversion on strings, nor or hyphens/underscores replacement on symbols.
  • No worrying about what package symbols are interned in (no symbols).
  • No worrying about dynamic variables affecting a parse as in cl-json, jonathan, jsown. Everything affecting jzon:parse is given at the call-site.

jzon:parse also accepts either a string, octet vector, stream, or pathname for simpler usage over libraries requiring one or the other, or having separate parse functions.

Finally, all public API's strive to have reasonable defaults so things 'Just Work'.

Performance

While parsing, jzon at worst performs at 50% the speed of jsown, while outperforming all other libraries.

And this is all while having the safety and correctness guarantees noted above.

Object key pooling

By default, jzon will keep track of object keys each jzon:parse (or jzon:make-parser), causing string= keys in a nested JSON object to be shared (eq):

(jzon:parse "[{\"x\": 5}, {\"x\": 10}, {\"x\": 15}]")

In this example, the string x is shared (eq) between all 3 objects.

This optimizes for the common case of reading a JSON payload containing many duplicate keys.

Tip: This behaviour may be altered by supplying a different :key-fn to jzon:parse or jzon:make-parser.

base-string coercion

When possible, strings will be coerced to cl:simple-base-string. This can lead to upwards of 1/4 memory usage per string on implementations like SBCL, which store strings internally as UTF32, while base-string can be represented in 8 bits per char.

Dependencies

License

See LICENSE.

jzon was originally a fork of st-json, but I ended up scrapping all of the code except for for the function decoding Unicode.

Alternatives

There are many CL JSON libraries available, and I defer to Sabra Crolleton's definitive list and comparisons https://sabracrolleton.github.io/json-review.

But for posterity, included in this repository is a set of tests and results for the following libraries:

I believe jzon to be the superiour choice and hope for it to become the new, true de-facto library in the world of JSON-in-CL once and for all.

About

A correct and safe JSON parser.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Common Lisp 97.8%
  • Batchfile 1.7%
  • Shell 0.5%
0