Ben Chuanlong Du's Blog

It is never too late to learn.

Tips on JSON

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Shortcomes of JSON

It is suggested that you avoid using the JSON format! TOML and YAML are better text-based alternatives. If readability is not a concern, a binary serialization format is preferred.

  1. Empty rows are not allowed in a list in JSON.

  2. Comments are NOT allowed in a JSON file. A hacking way is to put comments into a field. There are of course much better alternative serialization formats such as TOML and YAML which support comments.

  3. Keys in a JSON file must be strings (instead of numerical values)! Some languages/libraries IMPLICITLY convert keys to strings when serializing data to JSON format, which is error-prone.

  4. The JSON format is limited expressing complicated objects.

JSON Parsing Libraries in Python

  1. orjson is currently the best JSON parsing library for Python. It is very fast and is able handle large (<10G) JSON files.

  2. simdjson is a C++ library for parsing JSON which supports on-demand APIs. pysimdjson is a Python wrapping over simdjson . It is another good choice of JSON parsing library for Python. However, simdjson is unable to handle JSON files larger than 10G.

orjson

orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard json library or other third-party libraries. It serializes dataclass, datetime, numpy, and UUID instances natively.

ujson

ujson is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+.

python-rapidjson

python-rapidjson is a Python module wraps rapidjson which is an extremely fast C++ JSON parser and serialization library.

hyperjson

hyperjson is a hyper-fast, safe Python module to read and write JSON data. Works as a drop-in replacement for Python's built-in json module. This is alpha software and there will be bugs, so maybe don't deploy to production just yet.

JmesPath

JmesPath JMESPath allows you to declaratively specify how to extract elements from a JSON document.

json

simplejson

JavaScript

JSON5

JmesPath

Java

Below are Java libraries for parsing JSON.

google/gson

https://stackoverflow.com/questions/2779251/how-can-i-convert-json-to-a-hashmap-using-gson

A great JSON parsing library developed by Google.

stleary/JSON-java

Scala

circe/circe

json4s/json4s

https://stackoverflow.com/questions/29908297/how-can-i-convert-a-json-string-to-a-scala-map

There are an dependency issue using json4s with Spark. This issue was somehow fixed in the plugin johnrengelman/shadow v4.0.3. However, I can confirm that there are issue in johnrengelman/shadow v5.1.0.

https://github.com/json4s/json4s/issues/316

https://github.com/json4s/json4s/issues/418

argonaut-io/argonaut

References

http://json-schema.org/implementations.html

https://medium.com/@djoepramono/how-to-parse-json-in-scala-c024cb44f66b

Comments