Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Date Functions in Spark

Tips and Traps

  1. HDFS table might contain invalid data (I'm not clear about the reasons at this time) with respct to the column types (e.g., Date and Timestamp). This will cause issues when Spark tries to load the data. For more discussions, please refer to Unrecognized column type:TIMESTAMP_TYP.
  1. datetime.datetime or datetime.date

Count Number of Fields in Each Line

Sometimes, a structured text file might be malformatted. A simple way to verify it is to count the number of fields in each line.

Using awk

You can count the number of fields in each line using the following awk command. Unfortunately, awk does not take escaped characters into consideration …

Quickly Create a Scala Project Using Gradle in Intellij IDEA

Easy Way

  1. Create a directory (e.g., demo_proj) for your project.

  2. Run gradle init --type scala-library in terminal in the above directory.

  3. Import the directory as a Gradle project in IntelliJ IDEA. Alternatively, you can add apply plugin: 'idea' into build.gradle and then run the command ./gradlew openIdea to …