curldl.curldl module#
Interface for PycURL functionality
- class curldl.curldl.Curldl(basedir: str | PathLike[str], *, progress: bool = False, verbose: bool = False, user_agent: str = 'curl', retry_attempts: int = 3, retry_wait_sec: int | float = 2, timeout_sec: int | float = 120, max_redirects: int = 5, allowed_protocols_bitmask: int = 0, min_part_bytes: int = 65536, always_keep_part_bytes: int = 67108864, curl_config_callback: Callable[[Curl], None] | None = None)[source]#
Bases:
objectInterface for downloading functionality of PycURL. Basic usage example:
import curldl, os dl = curldl.Curldl(basedir='downloads', progress=True) dl.get('https://kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz', 'linux-0.01.tar.gz', size=73091, digests={'sha1': '566b6fb6365e25f47b972efa1506932b87d3ca7d'}) assert os.path.exists('downloads/linux-0.01.tar.gz')
For a more in-depth guide, refer to package documentation.
Initialize a PycURL-based downloader with a single
pycurl.Curlinstance that is reused and reconfigured for each download. The resulting downloader object should be therefore not shared among several threads.- Parameters:
basedir (str | os.PathLike[str]) – base directory path for downloaded file
progress (bool) – show progress bar on
sys.stderrverbose (bool) – enable verbose logging information from
libcurlatDEBUGleveluser_agent (str) –
User-Agentheader for HTTP(S) protocolsretry_attempts (int) – number of download retry attempts in case of failure in
DOWNLOAD_RETRY_ERRORSretry_wait_sec (int | float) – seconds to wait between download retry attempts
timeout_sec (int | float) – timeout seconds for
libcurloperationmax_redirects (int) – maximum number of redirects allowed in HTTP(S) protocols
allowed_protocols_bitmask (int) – bitmask of allowed protocols, e.g.
pycurl.PROTO_HTTP; default is or of values inDEFAULT_ALLOWED_PROTOCOLSmin_part_bytes (int) – partial downloads below this size are removed after unsuccessful download attempt; set to
0to disable removal of unsuccessful partial downloadsalways_keep_part_bytes (int) – do not remove partial downloads of this size or larger when resuming download even if no size or digest is provided for verification; set to
0to never remove existing partial downloadscurl_config_callback (Callable[[pycurl.Curl], None] | None) – pass a callback to further configure a
pycurl.Curlobject
- DOWNLOAD_RETRY_ERRORS = {5, 6, 7, 10, 12, 15, 16, 18, 22, 28, 30, 35, 47, 52, 55, 56, 79}#
libcurlerrors accepted by download retry policy
- DEFAULT_ALLOWED_PROTOCOLS = {1, 2, 4, 8, 32}#
URL schemes allowed by default, can be changed with
allowed_protocols_bitmaskconstructor parameter
- RESUME_FROM_SCHEMES = {'file', 'ftp', 'ftps', 'http', 'https'}#
URL schemes supported by
pycurl.RESUME_FROM. SFTP is not included because its implementation is buggy (total download size is reduced twice by initial size). Scheme is extracted viaurllibfrom initial URL, but there are no security implications since it is only used for removing partial downloads.
- VERBOSE_LOGGING = {0: 'TEXT', 1: 'IHDR', 2: 'OHDR'}#
Info types logged by
DEBUGFUNCTION()callback during verbose logging
- _get_configured_curl(url: str, path: str, *, timestamp: int | float | None = None) tuple[Curl, int][source]#
Reconfigure
pycurl.Curlinstance for requested download and return the instance. Methods should not work withself._unconfigured_curldirectly, only with instance returned by this method.- Parameters:
- Returns:
pycurl.Curlinstance configured for requested download and initial download offset (i.e., file size to resume)- Return type:
- _perform_curl_download(curl: pycurl.Curl, write_stream: BinaryIO, progress_bar: tqdm[NoReturn]) None[source]#
Complete pycurl.Curl configuration and start downloading.
- Parameters:
curl (pycurl.Curl) – configured
pycurl.Curlinstancewrite_stream (BinaryIO) – output stream to write to (a file opened in binary write mode)
progress_bar (tqdm[NoReturn]) – progress bar to use;
XFERINFOFUNCTION()is configured if enabled
- static _get_curl_progress_callback(progress_bar: tqdm[NoReturn]) Callable[[int, int, int, int], None][source]#
Constructs a progress bar-updating callback for
XFERINFOFUNCTION().- Parameters:
progress_bar (tqdm[NoReturn]) – progress bar to use, must be enabled
- Returns:
XFERINFOFUNCTION()callback- Return type:
- classmethod _curl_debug_cb(debug_type: int, debug_msg: bytes) None[source]#
Callback for
DEBUGFUNCTION()that logslibcurlmessages atDEBUGlevel.- Parameters:
debug_type (int) –
pycurl.Curl-supplied info type, e.g.pycurl.INFOTYPE_HEADER_INdebug_msg (bytes) –
pycurl.Curl-supplied debug message
- get(url: str, rel_path: str, *, size: int | None = None, digests: dict[str, str] | None = None) None[source]#
Download a URL to
basedir-relative path and verify its expected size and digests. Resume a partial download with.partextension if exists and supported by protocol, and retry failures according to retry policy. The downloaded file is removed in case of size or digest mismatch, andValueErroris raised.- Parameters:
url (str) – URL to download
rel_path (str) –
basedir-relative output file pathsize (int | None) – expected file size in bytes, or
Noneto ignoredigests (dict[str, str] | None) – mapping of digest algorithms to expected hexadecimal digest strings, or
Noneto ignore (seecurldl.util.fs.FileSystem.verify_size_and_digests())
- Raises:
ValueError – relative path escapes base directory or is otherwise unsafe (see
curldl.util.fs.FileSystem.verify_rel_path_is_safe()), or file size mismatch, or one of digests fails verificationpycurl.error – PycURL error when downloading after retries are exhausted
- _download_partial(url: str, path: str, *, timestamp: int | float | None = None, description: str | None = None) None[source]#
Start or resume a partial download of a URL to resolved path. If timestamp of an already downloaded file is provided, remove the partial file if the URL content is not more recent than the timestamp. This method should be invoked with a retry policy.
- Parameters:
- Raises:
pycurl.error – PycURL error when downloading, may result in a retry according to policy
- _prepare_full_path(rel_path: str) str[source]#
Verify that
basedir-relative path is safe and create the required directories.- Parameters:
rel_path (str) –
basedir-relative path- Returns:
resulting complete path
- Raises:
ValueError – relative path escapes base directory or is otherwise unsafe (see
curldl.util.fs.FileSystem.verify_rel_path_is_safe())- Return type:
- classmethod _get_response_status(curl: Curl, url: str, error: error | None) str[source]#
Format response code and description from cURL with a possible error.
- Parameters:
curl (Curl) –
pycurl.Curlinstance to extract response code fromurl (str) – a URL to extract scheme protocol from if
pycurl.EFFECTIVE_URLis unavailableerror (error | None) – PycURL exception instance
- Returns:
formatted string that includes a response code and its meaning, if available
- Return type: