Using pandas.read_csv¶
This approaches requires you to have a running TensorBoard which is serving the data you want to read.
Check the checkbox "Show data download links". See highlighted in the top-left corner of the screenshot below for an example.
Select an experimentation whose you'd like to download. See highlighted in the bottom-right corner of the screenshot for an example.
Parse TOML Files in Python
Build Docker Images on Kubernetes
-
BuildKit is a good tool for building Docker images on a Kubernetes cluster where you have root access.
-
Kaniko is another usable tool but it is not as intuitive as buildkit-cli-for-kubectl to use. As a matter of fact, tricky issues might arise when building Docker images using Kaniko.
-
buildah is …
Build Docker Images Using BuildKit on Kubernetes
buildkit-cli-for-kubectl
is a plugin for kubectl
which provides a similar experience building Docker images on Kubernetes
as building Docker images locally using docker build.
buildkit-cli-for-kubectl
works perfectly in a personal/development Kubernetes cluster (e.g., minikube running locally),
however,
it doesn't work in an enterprise production environment
due to permission …
Get CentOS Version
You can get the version of CentOS using the following command.
rpm -q centos-release
This trick can be used to get the version of the CentOS distribution on a Spark cluster. Basically, you run this command in the driver or workers to print the versions and then parse the log …
Control Number of Partitions of a DataFrame in Spark
Tips and Traps¶
DataFrame.repartitionrepartitions the DataFrame by hash code of each row. If you specify a (multiple) column(s) (instead of number of partitions) to the methodDataFrame.repartition, then hash code of the column(s) are calculated for repartition. In some situations, there are lots of hash conflictions even if the total number of rows is small (e.g., a few thousand), which means that partitions generated might be skewed