Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Advanced Use of "ls" in Linux

List Files Sorted by Time

You can list files sorted by time (newest first) using the -t option. Notice that the -t option is also support by hdfs dfs -ls.

ls -lht

Ignore Files

  1. You have to either enclose the pattern in quotes or escape the wildcard in patterns.

  2. Equivalent …

Proxy for `sudo`

You can setup proxy in a terminal by export environment variables http_proxy and `https_proxy'.

export http_proxy='proxy_server:port'
export https_proxy='proxy_server:port'

However, you might find the exported environment variables are not visible to sudo. This can be resovled by simplying adding the -E (preserve environment) option to sudo.

sudo …

How Long Does It Take to Observe a Sequence?

There are many interesting while at the same time very tricky problems in statistics. One famous question is that how many steps (expected) does it take to observe a given sequence (e.g. THTH, TTHH), if we flip a balanced coin?

This problem can be solved using (delay) renewal theory …

Select Columns from Structured Text Files

Python pandas

My first choice is pandas in Python. However, below are some tools for quick and dirty solutions.

q

q -t -H 'select c1, c3 from file.txt'

cut

cut -d\t -f1,3 file.txt

awk

awk -F'\t' '{print $1 "\t" $3}' file.tsv 

Note: neither cut …

Sample Lines from a File Using Command Line

NOTE: the article talks about sampling "lines" rather than "records". If a records can occupy multiple lines, e.g., if any field contains a new line (\n), the following tutorial does not work and you have to fall back to more powerful tools such as Python or R.

Let's say …