Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Tips on the find command in Linux

It is suggested that you use Python (the pathlib module), fselect or osquery (currently have some bugs) to locate files. The Python module pathlib is the most suitable one for relatively complex jobs. Both fselect and osquery support SQL-like syntax and are more intuitive than the find command.

  1. Find all …

Operate Remote Servers Using SSH

General Tips and Traps

  1. The permissions of the directory ~/.ssh and its subcontents on both the local machine and the remote server must be properly set in order for SSH login via public key to work. A good pratice is to set the permission of ~/.ssh to 700 (on both …

IPython Is the Best Shell

IPython is the Best Shell!

  1. Use IPython as much as possible.

    • IPython has virtues of both shell and Python.
    • You should avoid using shell scripts for complicate tasks any way.
  2. If you do want to use a Unix/Linux shell, I'd suggest you stick to Bash unless Linux distributions start …

Process Big Data Using PySpark

  1. PySpark 2.4 and older does not support Python 3.8. You have to use Python 3.7 with PySpark 2.4 or older.

  2. It can be extremely helpful to run a PySpark application locally to detect possible issues before submitting it to the Spark cluster.

    #!/usr/bin/env bash …

Regular Expression Equivalent

  1. The order of precedence of operators in POSIX extended regular expression is as follows.

    1. Collation-related bracket symbols [==], [::], [..]
    2. Escaped characters \
    3. Character set (bracket expression) []
    4. Grouping ()
    5. Single-character-ERE duplication *, +, ?, {m,n}
    6. Concatenation
    7. Anchoring ^, $
    8. Alternation |
  2. Some regular expression patterns are defined using a single leading backslash, e.g., \s, \b, etc. However, since special …