curldl.curldl module#
Interface for PycURL functionality
- class curldl.curldl.Curldl(basedir: str | PathLike[str], *, progress: bool = False, verbose: bool = False, user_agent: str = 'curl', retry_attempts: int = 3, retry_wait_sec: int | float = 2, timeout_sec: int | float = 120, max_redirects: int = 5, allowed_protocols_bitmask: int = 0, min_part_bytes: int = 65536, always_keep_part_bytes: int = 67108864, curl_config_callback: Callable[[Curl], None] | None = None)[source]#
Bases:
object
Interface for downloading functionality of PycURL. Basic usage example:
import curldl, os dl = curldl.Curldl(basedir='downloads', progress=True) dl.get('https://kernel.org/pub/linux/kernel/Historic/linux-0.01.tar.gz', 'linux-0.01.tar.gz', size=73091, digests={'sha1': '566b6fb6365e25f47b972efa1506932b87d3ca7d'}) assert os.path.exists('downloads/linux-0.01.tar.gz')
For a more in-depth guide, refer to package documentation.
Initialize a PycURL-based downloader with a single
pycurl.Curl
instance that is reused and reconfigured for each download. The resulting downloader object should be therefore not shared among several threads.- Parameters:
basedir (str | os.PathLike[str]) – base directory path for downloaded file
progress (bool) – show progress bar on
sys.stderr
verbose (bool) – enable verbose logging information from
libcurl
atDEBUG
leveluser_agent (str) –
User-Agent
header for HTTP(S) protocolsretry_attempts (int) – number of download retry attempts in case of failure in
DOWNLOAD_RETRY_ERRORS
retry_wait_sec (int | float) – seconds to wait between download retry attempts
timeout_sec (int | float) – timeout seconds for
libcurl
operationmax_redirects (int) – maximum number of redirects allowed in HTTP(S) protocols
allowed_protocols_bitmask (int) – bitmask of allowed protocols, e.g.
pycurl.PROTO_HTTP
; default is or of values inDEFAULT_ALLOWED_PROTOCOLS
min_part_bytes (int) – partial downloads below this size are removed after unsuccessful download attempt; set to
0
to disable removal of unsuccessful partial downloadsalways_keep_part_bytes (int) – do not remove partial downloads of this size or larger when resuming download even if no size or digest is provided for verification; set to
0
to never remove existing partial downloadscurl_config_callback (Callable[[pycurl.Curl], None] | None) – pass a callback to further configure a
pycurl.Curl
object
- DOWNLOAD_RETRY_ERRORS = {5, 6, 7, 10, 12, 15, 16, 18, 22, 28, 30, 35, 47, 52, 55, 56, 79}#
libcurl
errors accepted by download retry policy
- DEFAULT_ALLOWED_PROTOCOLS = {1, 2, 4, 8, 32}#
URL schemes allowed by default, can be changed with
allowed_protocols_bitmask
constructor parameter
- RESUME_FROM_SCHEMES = {'file', 'ftp', 'ftps', 'http', 'https'}#
URL schemes supported by
pycurl.RESUME_FROM
. SFTP is not included because its implementation is buggy (total download size is reduced twice by initial size). Scheme is extracted viaurllib
from initial URL, but there are no security implications since it is only used for removing partial downloads.
- VERBOSE_LOGGING = {0: 'TEXT', 1: 'IHDR', 2: 'OHDR'}#
Info types logged by
DEBUGFUNCTION()
callback during verbose logging
- _get_configured_curl(url: str, path: str, *, timestamp: int | float | None = None) tuple[Curl, int] [source]#
Reconfigure
pycurl.Curl
instance for requested download and return the instance. Methods should not work withself._unconfigured_curl
directly, only with instance returned by this method.- Parameters:
- Returns:
pycurl.Curl
instance configured for requested download and initial download offset (i.e., file size to resume)- Return type:
- _perform_curl_download(curl: pycurl.Curl, write_stream: BinaryIO, progress_bar: tqdm[NoReturn]) None [source]#
Complete pycurl.Curl configuration and start downloading.
- Parameters:
curl (pycurl.Curl) – configured
pycurl.Curl
instancewrite_stream (BinaryIO) – output stream to write to (a file opened in binary write mode)
progress_bar (tqdm[NoReturn]) – progress bar to use;
XFERINFOFUNCTION()
is configured if enabled
- static _get_curl_progress_callback(progress_bar: tqdm[NoReturn]) Callable[[int, int, int, int], None] [source]#
Constructs a progress bar-updating callback for
XFERINFOFUNCTION()
.- Parameters:
progress_bar (tqdm[NoReturn]) – progress bar to use, must be enabled
- Returns:
XFERINFOFUNCTION()
callback- Return type:
- classmethod _curl_debug_cb(debug_type: int, debug_msg: bytes) None [source]#
Callback for
DEBUGFUNCTION()
that logslibcurl
messages atDEBUG
level.- Parameters:
debug_type (int) –
pycurl.Curl
-supplied info type, e.g.pycurl.INFOTYPE_HEADER_IN
debug_msg (bytes) –
pycurl.Curl
-supplied debug message
- get(url: str, rel_path: str, *, size: int | None = None, digests: dict[str, str] | None = None) None [source]#
Download a URL to
basedir
-relative path and verify its expected size and digests. Resume a partial download with.part
extension if exists and supported by protocol, and retry failures according to retry policy. The downloaded file is removed in case of size or digest mismatch, andValueError
is raised.- Parameters:
url (str) – URL to download
rel_path (str) –
basedir
-relative output file pathsize (int | None) – expected file size in bytes, or
None
to ignoredigests (dict[str, str] | None) – mapping of digest algorithms to expected hexadecimal digest strings, or
None
to ignore (seecurldl.util.fs.FileSystem.verify_size_and_digests()
)
- Raises:
ValueError – relative path escapes base directory or is otherwise unsafe (see
curldl.util.fs.FileSystem.verify_rel_path_is_safe()
), or file size mismatch, or one of digests fails verificationpycurl.error – PycURL error when downloading after retries are exhausted
- _download_partial(url: str, path: str, *, timestamp: int | float | None = None, description: str | None = None) None [source]#
Start or resume a partial download of a URL to resolved path. If timestamp of an already downloaded file is provided, remove the partial file if the URL content is not more recent than the timestamp. This method should be invoked with a retry policy.
- Parameters:
- Raises:
pycurl.error – PycURL error when downloading, may result in a retry according to policy
- _prepare_full_path(rel_path: str) str [source]#
Verify that
basedir
-relative path is safe and create the required directories.- Parameters:
rel_path (str) –
basedir
-relative path- Returns:
resulting complete path
- Raises:
ValueError – relative path escapes base directory or is otherwise unsafe (see
curldl.util.fs.FileSystem.verify_rel_path_is_safe()
)- Return type:
- classmethod _get_response_status(curl: Curl, url: str, error: error | None) str [source]#
Format response code and description from cURL with a possible error.
- Parameters:
curl (Curl) –
pycurl.Curl
instance to extract response code fromurl (str) – a URL to extract scheme protocol from if
pycurl.EFFECTIVE_URL
is unavailableerror (error | None) – PycURL exception instance
- Returns:
formatted string that includes a response code and its meaning, if available
- Return type: