Ben Chuanlong Du's Blog

And let it direct your passion with reason.

Regular Expression Equivalent

  1. The order of precedence of operators in POSIX extended regular expression is as follows.

    1. Collation-related bracket symbols [==], [::], [..]
    2. Escaped characters \
    3. Character set (bracket expression) []
    4. Grouping ()
    5. Single-character-ERE duplication *, +, ?, {m,n}
    6. Concatenation
    7. Anchoring ^, $
    8. Alternation |
  2. Some regular expression patterns are defined using a single leading backslash, e.g., \s, \b, etc. However, since special …

Count Number of Fields in Each Line

Sometimes, a structured text file might be malformatted. A simple way to verify it is to count the number of fields in each line.

Using awk

You can count the number of fields in each line using the following awk command. Unfortunately, awk does not take escaped characters into consideration …

Parallel Computing Using Multithreading

  1. Not all jobs are suitable for parallel computing. The more comminication that threads has to make, the more dependent the jobs are and the less efficient the parallel computing is.

  2. Generally speaking, commercial softwares (Mathematica, MATLAB and Revolution R, etc.) have very good support on parallel computing.

Python

Please refer …

An IO Bug in R

I encountered an input/output bug in R in Linux system. The symptom is that input and output are not displayed in the terminal and the warning message "An unusual circumstance has arisen in the nesting of readline input. Please report using bug.report()" is shown. I found that though …

Estimation of False Discovery Rate using Sequential Permutation Pvalues

I wrote a paper on sequential permutation test with Tim Bancroft and Dan Nettleton. The paper "T. Bancroft, C. Du and D. Nettleton (2012). Estimation of False Discovery Rate Using Sequential Permutation P­Values." has been accepted by Biometrics. To illustrate ideas in the paper and make sequential permutation test …

Stick Breaking Problems

The following is a popular brain teaser problem about probability.

Randomly select two points on a unit stick to break it into 3 pieces, what is the probability that the 3 pieces can form a triangle?

The critical thing here is how are the two points selected. The most popular …