Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Parse TOML Files in Python

  1. There are 2 popular Python libraries tomlkit and toml for parsing TOML formatted files in Python. tomlkit is preferred to toml as it is more flexible and style-preserving.

  2. A TOML file always interpret a key (even a bare ASCII integer) as string. For this reason, a dict with numerical keys …

Process Big Data Using PySpark

  1. PySpark 2.4 and older does not support Python 3.8. You have to use Python 3.7 with PySpark 2.4 or older.

  2. It can be extremely helpful to run a PySpark application locally to detect possible issues before submitting it to the Spark cluster.

    #!/usr/bin/env bash …

Regular Expression Equivalent

  1. The order of precedence of operators in POSIX extended regular expression is as follows.

    1. Collation-related bracket symbols [==], [::], [..]
    2. Escaped characters \
    3. Character set (bracket expression) []
    4. Grouping ()
    5. Single-character-ERE duplication *, +, ?, {m,n}
    6. Concatenation
    7. Anchoring ^, $
    8. Alternation |
  2. Some regular expression patterns are defined using a single leading backslash, e.g., \s, \b, etc. However, since special …

Install Python Packages Behind Firewall

It is recommended that you use pip to install Python packages.

  1. If you don't already know the proxy in use (in your company), read the post Find out Proxy in Use to figure it out.

  2. Set proxy environment variables.

    set http_proxy=http://user:password@proxy_ip:port
    set https_proxy=https://user …