Ben Chuanlong Du's Blog

It is never too late to learn.

Hands on the urllib Module in Python

Comments

  1. It is suggested that you use the requests module instead of urllib unless you want to have minimal 3rd-party dependencies.

  2. You have to explicit import urllib.request in order to use it in Python 3. Please refer to https://bugs.python.org/issue36701 for more discussions. This is how Python 3 intends to work generally speaking. Of course, there are a few exceptions such as os.path.

In [1]:
import urllib.request

urllib.request.urlopen

In [2]:
r = urllib.request.urlopen("https://github.com/dclong/dsutil/releases/latest")
In [3]:
r.url
Out[3]:
'https://github.com/dclong/dsutil/releases/tag/v0.10.0'

urllib.request.urlretrieve

urllib.request.urlretrieve can be used to download a file from the internet to local.

In [6]:
file, obj = urllib.request.urlretrieve(
    "http://www.legendu.net/media/download_code_server.py",
    "/tmp/download_code_server.py",
)
In [7]:
file
Out[7]:
'/tmp/download_code_server.py'
In [8]:
obj
Out[8]:
<http.client.HTTPMessage at 0x7fe9efc404a8>
In [9]:
!ls /tmp/download_code_server.py
/tmp/download_code_server.py
In [10]:
type(obj)
Out[10]:
http.client.HTTPMessage
In [12]:
dir(obj)
Out[12]:
['__bytes__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_charset',
 '_default_type',
 '_get_params_preserve',
 '_headers',
 '_payload',
 '_unixfrom',
 'add_header',
 'as_bytes',
 'as_string',
 'attach',
 'defects',
 'del_param',
 'epilogue',
 'get',
 'get_all',
 'get_boundary',
 'get_charset',
 'get_charsets',
 'get_content_charset',
 'get_content_disposition',
 'get_content_maintype',
 'get_content_subtype',
 'get_content_type',
 'get_default_type',
 'get_filename',
 'get_param',
 'get_params',
 'get_payload',
 'get_unixfrom',
 'getallmatchingheaders',
 'is_multipart',
 'items',
 'keys',
 'policy',
 'preamble',
 'raw_items',
 'replace_header',
 'set_boundary',
 'set_charset',
 'set_default_type',
 'set_param',
 'set_payload',
 'set_raw',
 'set_type',
 'set_unixfrom',
 'values',
 'walk']
In [13]:
obj.as_string()
Out[13]:
'Server: GitHub.com\nContent-Type: application/octet-stream\nLast-Modified: Fri, 24 Jan 2020 20:21:29 GMT\nETag: "5e2b51c9-2de"\nAccess-Control-Allow-Origin: *\nExpires: Fri, 24 Jan 2020 20:34:29 GMT\nCache-Control: max-age=600\nX-Proxy-Cache: MISS\nX-GitHub-Request-Id: 6ACA:869A:42BECA:4B481B:5E2B527D\nContent-Length: 734\nAccept-Ranges: bytes\nDate: Fri, 24 Jan 2020 22:14:08 GMT\nVia: 1.1 varnish\nAge: 13\nConnection: close\nX-Served-By: cache-sea4477-SEA\nX-Cache: HIT\nX-Cache-Hits: 2\nX-Timer: S1579904049.754540,VS0,VE0\nVary: Accept-Encoding\nX-Fastly-Request-ID: c6c2ef45f576ba81de6fa160a79b67dfda5beaac\n\n'
In [ ]:
 

Comments