Ben Chuanlong Du's Blog

And let it direct your passion with reason.

User-defined Function (UDF) in PySpark

Tips and Traps

  1. The easist way to define a UDF in PySpark is to use the @udf tag, and similarly the easist way to define a Pandas UDF in PySpark is to use the @pandas_udf tag. Pandas UDFs are preferred to UDFs for server reasons. First, pandas UDFs are typically much faster than UDFs. Second, pandas UDFs are more flexible than UDFs on parameter passing. Both UDFs and pandas UDFs can take multiple columns as parameters. In addition, pandas UDFs can take a DataFrame as parameter (when passed to the apply

Convert PDF to EPS

There are tons of tools for converting PDF pictures to EPS pictures in Linux. The pdf2ps command is a good one. It produces EPS pictures without losing much resolution. The general purpose tools convert (from the ImageMagick package) does not produce as good quality EPS figures.

Terminal Multiplexers

zellij

  1. There are 2 mature popular terminal multiplexer apps: screen and tmux. Both of them are very useful if you want to work on multiple tasks over 1 SSH connection. Screen is relative simple to use while tmux is much more powerful and more complicated to use.

  2. Besides enabling users to …

Schedule Cron Tasks in a Docker Container

Cron tasks work in a Docker container. However, you have to manually start the cron deamon (root or sudo required) using cron or sudo cron if it is not configured (via the Docker entrypoint) to start on the start of the Docker container. For tutorials on crontab, please refer to …

Regular Expression Equivalent

  1. The order of precedence of operators in POSIX extended regular expression is as follows.

    1. Collation-related bracket symbols [==], [::], [..]
    2. Escaped characters \
    3. Character set (bracket expression) []
    4. Grouping ()
    5. Single-character-ERE duplication *, +, ?, {m,n}
    6. Concatenation
    7. Anchoring ^, $
    8. Alternation |
  2. Some regular expression patterns are defined using a single leading backslash, e.g., \s, \b, etc. However, since special …