Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!
The shell command curl and wget can be called (using os.system or subprocess.run)
to download files from internet.
You can also download files using Python modules directly of course.
url = "http://www.legendu.net/media/download_code_server.py"urllib.request.urlretrieve¶
urllib.request.urlretrieve can be used to download a file from the internet to local.
For more details,
please refer to Hands on the urllib Module in Python.
import urllib.request
file, http_msg = urllib.request.urlretrieve(
"http://www.legendu.net/media/download_code_server.py",
"/tmp/download_code_server.py",
)file'/tmp/download_code_server.py'!ls /tmp/download_code_server.py/tmp/download_code_server.py
http_msg<http.client.HTTPMessage at 0x7fe9efcaa358>http_msg.as_string()'Server: GitHub.com\nContent-Type: application/octet-stream\nLast-Modified: Fri, 24 Jan 2020 20:21:29 GMT\nETag: "5e2b51c9-2de"\nAccess-Control-Allow-Origin: *\nExpires: Fri, 24 Jan 2020 20:34:29 GMT\nCache-Control: max-age=600\nX-Proxy-Cache: MISS\nX-GitHub-Request-Id: 6ACA:869A:42BECA:4B481B:5E2B527D\nContent-Length: 734\nAccept-Ranges: bytes\nDate: Fri, 24 Jan 2020 22:19:35 GMT\nVia: 1.1 varnish\nAge: 339\nConnection: close\nX-Served-By: cache-sea4480-SEA\nX-Cache: HIT\nX-Cache-Hits: 1\nX-Timer: S1579904375.100592,VS0,VE0\nVary: Accept-Encoding\nX-Fastly-Request-ID: 44fa67063caa264fc25f2cc26353c8dfc534ae66\n\n'requests¶
Notice that you must open the file to write into with the mode wb.
import requests
import shutil
resp = requests.get(url, stream=True)
if not resp.ok:
sys.exit("Network issue!")
with open("/tmp/download_code_server_2.py", "wb") as fout:
shutil.copyfileobj(resp.raw, fout)!ls /tmp/download_code_server_2.py/tmp/download_code_server_2.py
!cat /tmp/download_code_server_2.py#!/usr/bin/env python3
import urllib.request
import json
class GitHubRepoRelease:
def __init__(self, repo):
self.repo = repo
url = f"https://api.github.com/repos/{repo}/releases/latest"
self._resp_http = urllib.request.urlopen(url)
self.release = json.load(self._resp_http)
def download_urls(self, func=None):
urls = [asset["browser_download_url"] for asset in self.release["assets"]]
if func:
urls = [url for url in urls if func(url)]
return urls
if __name__ == '__main__':
release = GitHubRepoRelease("cdr/code-server")
url = release.download_urls(lambda url: "linux-x86_64" in url)[0]
urllib.request.urlretrieve(url, "/tmp/code.tar.gz")
wget¶
There is no option to overwrite an existing file currently.
However, this can be achieved by renaming/moving the downloaded file (using shutil).
import wget
wget.download(url, out="/tmp/download_code_server_3.py")'/tmp/download_code_server_3.py'import wget
wget.download(url, out="/tmp/download_code_server_3.py", bar=wget.bar_adaptive)'/tmp/download_code_server_3 (1).py'Configure proxy for the Python module wget.
import socket
import socks
socks.set_default_proxy(socks.SOCKS5, "localhost")
socket.socket = socks.socksocketpycurl¶
import pycurl
with open("/tmp/download_code_server_4.py", "wb") as fout:
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.WRITEDATA, fout)
c.perform()
c.close()---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-29-0c73f10fc26e> in <module>
----> 1 import pycurl
2
3 with open('/tmp/download_code_server_4.py', 'wb') as fout:
4 c = pycurl.Curl()
5 c.setopt(c.URL, url)
ModuleNotFoundError: No module named 'pycurl'