NOTE: the article talks about sampling "lines" rather than "records". 
If a records can occupy multiple lines, 
e.g., if any field contains a new line (\n),
the following tutorial does not work 
and you have to fall back to more powerful tools such as Python or R.
Let's say …