Tips and Traps¶
- HDFS table might contain invalid data (I'm not clear about the reasons at this time) with respct to the column types (e.g., Date and Timestamp). This will cause issues when Spark tries to load the data. For more discussions, please refer to Unrecognized column type:TIMESTAMP_TYP.
datetime.datetime
ordatetime.date
Count Number of Fields in Each Line
Sometimes, a structured text file might be malformatted. A simple way to verify it is to count the number of fields in each line.
Using awk
You can count the number of fields in each line using the following awk command. Unfortunately, awk does not take escaped characters into consideration …
Quickly Create a Scala Project Using Gradle in Intellij IDEA
Easy Way
-
Create a directory (e.g.,
demo_proj
) for your project. -
Run
gradle init --type scala-library
in terminal in the above directory. -
Import the directory as a Gradle project in IntelliJ IDEA. Alternatively, you can add
apply plugin: 'idea'
intobuild.gradle
and then run the command./gradlew openIdea
to …
Visual Studio Code for Python
Extensions
Please refer to Useful Visual Studio Code Extensions .
Set Python Environment for Visual Studio Code Server
-
File -> Preference -> Settings
-
Click on Workspace.
-
Search for
Python Path
. -
Change Python Path to the one you want to use.
Debug a Python Project
Visual Studio Live Share
Install Python Packages Behind Firewall
It is recommended that you use pip
to install Python packages.
-
If you don't already know the proxy in use (in your company), read the post Find out Proxy in Use to figure it out.
-
Set proxy environment variables.
set http_proxy=http://user:password@proxy_ip:port set https_proxy=https://user …
Visualize Nvidia GPU Usage
You can use the tool nvtop
(Linux only)
to visualize the usage of Nvidia GPUs.
However,
it is only available on Linux
and is not suitable for tracking and visualize the GPU usage in a long time period.
Another simple approach to track and visualize the GPU usage is
to dump GPU usage statistics into a CSV file
using the following command