Ben Chuanlong Du's Blog

And let it direct your passion with reason.

The set Collection in Python

General Tips and Traps

  1. The set class is implemented based on hash table which means that its elements must be hashable (has methods __hash__ and __eq__). The set class implements the mathematical concepts of set which means that its elements are unordered and does not perserve insertion order of elements. Notice that this is different from the dict class which is also implemented based on hash table but keeps insertion order of elements! The article Why don't Python sets preserve insertion order?

Tips on pex

Steps to Build a pex Environment File

  1. Start a Python Docker image with the right version of Python interpreter installed. For example,

    docker run -it -v $(pwd):/workdir python:3.5-buster /bin/bash
    
  2. Install pex.

    pip3 install pex
    
  3. Build a pex environment file.

    pex --python=python3 -v pyspark findspark -o …

Hands on dict in Python

Tips and Traps

  1. Starting from Python 3.7, dict preserves insertion order (i.e., dict is ordered). There is no need to use OrderedDict any more in Python 3.7+. However, set in Python is implemented as an unordered hashset and thus is neither ordered nor sorted. A trick to dedup an iterable values

Regular Expression in Python

Online Regular Expression Tester

  1. The Python module re automatically compiles a plain/text pattern using re.compile and caches it, so there's not much benefit to compile plain/text patterns by yourself.

  2. Some regular expression patterns are defined using a single leading backslash, e.g., \s, \b, etc. However, since special characters (e.g., \) need to be escaped in strings in most programming languages, you will need the string "\\s"