Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Compare Two Directories on Linux

On the Same Machine

If the two directories are on the same machine, you can use either colordiff (preferred over diff) or git diff to find the differences between them.

colordiff -qr dir_1 dir_2
git diff --no-index dir_1 dir_2

On Different Machines

It is a little bit tricky when the …

Advanced Use of "ls" in Linux

List Files Sorted by Time

You can list files sorted by time (newest first) using the -t option. Notice that the -t option is also support by hdfs dfs -ls.

ls -lht

Ignore Files

  1. You have to either enclose the pattern in quotes or escape the wildcard in patterns.

  2. Equivalent …

Proxy for `sudo`

You can setup proxy in a terminal by export environment variables http_proxy and `https_proxy'.

export http_proxy='proxy_server:port'
export https_proxy='proxy_server:port'

However, you might find the exported environment variables are not visible to sudo. This can be resovled by simplying adding the -E (preserve environment) option to sudo.

sudo …

How Long Does It Take to Observe a Sequence?

There are many interesting while at the same time very tricky problems in statistics. One famous question is that how many steps (expected) does it take to observe a given sequence (e.g. THTH, TTHH), if we flip a balanced coin?

This problem can be solved using (delay) renewal theory …

Select Columns from Structured Text Files

Python pandas

My first choice is pandas in Python. However, below are some tools for quick and dirty solutions.

q

q -t -H 'select c1, c3 from file.txt'

cut

cut -d\t -f1,3 file.txt

awk

awk -F'\t' '{print $1 "\t" $3}' file.tsv 

Note: neither cut …