Ben Chuanlong Du's Blog

It is never too late to learn.

Profile Performance of Python Applications

Tips

  1. cProfile (implemented in C) is preferred over profile (implemented in Python).

  2. The profiler modules (cProfile and profile) and tools based on them (e.g., %prun and %%prun for notebook) are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is time and timeit for reasonably accurate results). This particularly applies to benchmarking Python code against C code: the profilers introduce overhead for Python code, but not for C-level functions, and so the C code would seem faster than any Python one. Overall, the profiler modules and tools based on them (e.g., %prun and %%prun for notebooks) might slow down your Python code up to 3 times.

time.time_ns

You can use time.time_ns (since Python 3.7+) to get the current system time in nanoseconds. This can be used to measuree the performance of Python code.

In [3]:
import time
import math

begin_time = time.time_ns()
for i in range(10000):
    math.sqrt(i)
end_time = time.time_ns()
print("Time used:", (end_time - begin_time) / 1e9, "seconds")
Time used: 0.0016448 seconds

timeit.timeit

Notice that timeit.timeit runs a statement 1 million times to benchmark it by default. This iss probably not what you want most of the time especially when you have a long running task to benchmark. Do NOT forget to set the number parameter to an appropriate value when you call timeit.timeit.

In [7]:
import timeit
import time
import math


def sqrts():
    for i in range(10_000):
        math.sqrt(i)


timeit.timeit(sqrts, timer=time.perf_counter_ns, number=1) / 1e9
Out[7]:
0.001551997

cProfile.run

This function takes a single argument that can be passed to the exec() function, and an optional file name. In all cases this routine executes:

exec(command, main.dict, main.dict) and gathers profiling statistics from the execution. If no file name is present, then this function automatically creates a Stats instance and prints a simple profiling report. If the sort value is specified, it is passed to this Stats instance to control how the results are sorted.

In [1]:
import cProfile
import re

cProfile.run('re.compile("foo|bar")')
         214 function calls (207 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 enum.py:284(__call__)
        2    0.000    0.000    0.000    0.000 enum.py:526(__new__)
        1    0.000    0.000    0.000    0.000 enum.py:836(__and__)
        1    0.000    0.000    0.000    0.000 re.py:232(compile)
        1    0.000    0.000    0.000    0.000 re.py:271(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:249(_compile_charset)
        1    0.000    0.000    0.000    0.000 sre_compile.py:276(_optimize_charset)
        2    0.000    0.000    0.000    0.000 sre_compile.py:453(_get_iscased)
        1    0.000    0.000    0.000    0.000 sre_compile.py:461(_get_literal_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:492(_get_charset_prefix)
        1    0.000    0.000    0.000    0.000 sre_compile.py:536(_compile_info)
        2    0.000    0.000    0.000    0.000 sre_compile.py:595(isstring)
        1    0.000    0.000    0.000    0.000 sre_compile.py:598(_code)
      3/1    0.000    0.000    0.000    0.000 sre_compile.py:71(_compile)
        1    0.000    0.000    0.000    0.000 sre_compile.py:759(compile)
        3    0.000    0.000    0.000    0.000 sre_parse.py:111(__init__)
        7    0.000    0.000    0.000    0.000 sre_parse.py:160(__len__)
       18    0.000    0.000    0.000    0.000 sre_parse.py:164(__getitem__)
        7    0.000    0.000    0.000    0.000 sre_parse.py:172(append)
      3/1    0.000    0.000    0.000    0.000 sre_parse.py:174(getwidth)
        1    0.000    0.000    0.000    0.000 sre_parse.py:224(__init__)
        8    0.000    0.000    0.000    0.000 sre_parse.py:233(__next)
        2    0.000    0.000    0.000    0.000 sre_parse.py:249(match)
        6    0.000    0.000    0.000    0.000 sre_parse.py:254(get)
        1    0.000    0.000    0.000    0.000 sre_parse.py:286(tell)
        1    0.000    0.000    0.000    0.000 sre_parse.py:417(_parse_sub)
        2    0.000    0.000    0.000    0.000 sre_parse.py:475(_parse)
        1    0.000    0.000    0.000    0.000 sre_parse.py:76(__init__)
        2    0.000    0.000    0.000    0.000 sre_parse.py:81(groups)
        1    0.000    0.000    0.000    0.000 sre_parse.py:903(fix_flags)
        1    0.000    0.000    0.000    0.000 sre_parse.py:919(parse)
        1    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
       25    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
    29/26    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 {built-in method builtins.max}
        9    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        6    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
       48    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        5    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}


ncalls for the number of calls.

tottime for the total time spent in the given function (and excluding time made in calls to sub-functions)

percall is the quotient of tottime divided by ncalls

cumtime is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.

percall is the quotient of cumtime divided by primitive calls

filename:lineno(function) provides the respective data of each function

When there are two numbers in the first column (for example 3/1), it means that the function recursed. The second value is the number of primitive calls and the former is the total number of calls. Note that when the function does not recurse, these two values are the same, and only the single figure is printed.

Instead of printing the output at the end of the profile run, you can save the results to a file by specifying a filename to the run() function:

In [2]:
import cProfile
import re

cProfile.run('re.compile("foo|bar")', "restats")
In [3]:
!cat restats
�)�~��<built-in method builtins.exec>)�rg�������>g�&�|��>{0)rr�%<built-in method builtins.isinstance>)rrg�����ư>g�����ư>{)�/usr/lib/python3.7/re.pyi_compile)rrg�����ư>g�����ư>0)<string>r<module>)rrg�����ư>gT�qs*�>{)rrr)rrg�����ư>gT�qs*�>0)rr�0<method 'disable' of '_lsprof.Profiler' objects>)rrgg{0)r���compile)rrg�����ư>g�h㈵��>{)rr	)rrg�����ư>g�h㈵��>0)rir)rrgT�qs*�>g�������>{)rr
r
)rrgT�qs*�>g�������>00

The pstats.Stats class reads profile results from a file and formats them in various ways.

In [5]:
import pstats

pstats.Stats("restats").print_stats()
Fri Mar 13 11:00:31 2020    restats

         6 function calls in 0.000 seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.7/re.py:232(compile)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.7/re.py:271(_compile)


Out[5]:
<pstats.Stats at 0x7f4dc9f656a0>

cProfile in Command-line

In [ ]:
The file cProfile can also be invoked as a script to profile another script. For example:
In [8]:
!python3 -m cProfile -h
Usage: cProfile.py [-o output_file_path] [-s sort] [-m module | scriptfile] [arg] ...

Options:
  -h, --help            show this help message and exit
  -o OUTFILE, --outfile=OUTFILE
                        Save stats to <outfile>
  -s SORT, --sort=SORT  Sort order when printing to stdout, based on
                        pstats.Stats class
  -m                    Profile a library module

scalene

scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information.

py-spy

py-spy is a sampling profiler for Python programs. It lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. py-spy is extremely low overhead: it is written in Rust for speed and doesn't run in the same process as the profiled Python program. This means py-spy is safe to use against production Python code.

pyperf

The Python pyperf module is a toolkit to write, run and analyze benchmarks.

SnakeViz

Installation

In [10]:
!pip3 install snakeviz
Requirement already satisfied: snakeviz in /home/dclong/.local/lib/python3.7/site-packages (2.0.1)
Requirement already satisfied: tornado>=2.0 in /usr/local/lib/python3.7/dist-packages (from snakeviz) (6.0.4)
In [13]:
!~/.local/bin/snakeviz -h
usage: snakeviz [-h] [-v] [-H ADDR] [-p PORT] [-b BROWSER_PATH] [-s] filename

Start SnakeViz to view a Python profile.

positional arguments:
  filename              Python profile to view

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -H ADDR, --hostname ADDR
                        hostname to bind to (default: 127.0.0.1)
  -p PORT, --port PORT  port to bind to; if this port is already in use a free
                        port will be selected automatically (default: 8080)
  -b BROWSER_PATH, --browser BROWSER_PATH
                        name of webbrowser to launch as described in the
                        documentation of Python's webbrowser module:
                        https://docs.python.org/3/library/webbrowser.html
  -s, --server          start SnakeViz in server-only mode--no attempt will be
                        made to open a browser

Tuna

Tuna is a very similar tool to SnakeViz but not as good as SnakeViz.

pyperformance

The pyperformance project is intended to be an authoritative source of benchmarks for all Python implementations. The focus is on real-world benchmarks, rather than synthetic benchmarks, using whole applications when possible.

Profiling Importing Time

https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPROFILEIMPORTTIME

python3 -X importtime your_code.py

In [ ]:
 

Comments