Ben Chuanlong Du's Blog

It is never too late to learn.

Serialization and deserialization in Python

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

  1. JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is suggested that you avoid using it . Please refer to Shotcomes of JSON for detailed discussions on this. TOML and YAML are better text-based alternatives to JSON. If serialization and deserialization is done in Python only, pickle is preferred. If you do want to use JSON in Python, please refer to JSON Parsing Libraries in Python for more discussions.

  2. TOML

  3. YAML

    • YAML is a superset of json.
    • YAML support serialization and deserialization of set while json does not.
    • YAML is more readable.
  4. Pickle is the most popular serialization and deserialization tool in Python. It supports serializing/deserializing most (even not all) Python classes.

  5. Dill extends Python's Pickle module for serializing and de-serializing Python objects to the majority of the built-in python types. It also provides some good diagnostic tools for pickling, the best of which is the pickle trace. For more discussions, please refer to How to check which detail of a complex object cannot be pickled .

  6. cloudpickle

  7. Use Parquet for pandas DataFrame.

References

Comments