Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Regular Expression in Bash

It is suggested that you use Python script instead of Shell script as much as possible. If you do have to stick with Shell script, you can use =~ for regular expression matching in Bash. This make Bash syntax extremely flexible and powerful. For example, you can match multiple strings using …

Date Functions in Spark

Tips and Traps

  1. HDFS table might contain invalid data (I'm not clear about the reasons at this time) with respct to the column types (e.g., Date and Timestamp). This will cause issues when Spark tries to load the data. For more discussions, please refer to Unrecognized column type:TIMESTAMP_TYP.
  1. datetime.datetime or datetime.date

Best Filesystem Format for Cross-platform Data Exchanging

FAT32

FAT32 is an outdated filesystem. The maximum size for a single file is 4G. You should instead exFAT instead of FAT32 where possible.

exFAT

exFAT is great cross-platform filesystem that is support out-of-box by Windows, Linux and macOS. There is practically no limit (big enough for average users) on …