b2sdk._internal.transfer.inbound.downloader.parallel – ParallelTransferer

class b2sdk._internal.transfer.inbound.downloader.parallel.ParallelDownloader(min_part_size, max_streams=None, **kwargs)[source]

Bases: AbstractDownloader

Downloader using threads to download&write multiple parts of an object in parallel.

Each part is downloaded by its own thread, while all writes are done by additional dedicated thread. This can increase performance even for a small file, as fetching & writing can be done in parallel.

Parameters:
FINISH_HASHING_BUFFER_SIZE = 1048576
SUPPORTS_DECODE_CONTENT = False
__init__(min_part_size, max_streams=None, **kwargs)[source]
Parameters:
  • max_streams (Optional[int]) – maximum number of simultaneous streams

  • min_part_size (int) – minimum amount of data a single stream will retrieve, in bytes

download(file, response, download_version, session, encryption=None)[source]

Download a file from given url using parallel download sessions and stores it in the given download_destination.

Parameters:
class b2sdk._internal.transfer.inbound.downloader.parallel.WriterThread(file, max_queue_depth)[source]

Bases: Thread

A thread responsible for keeping a queue of data chunks to write to a file-like object and for actually writing them down. Since a single thread is responsible for synchronization of the writes, we avoid a lot of issues between userspace and kernelspace that would normally require flushing buffers between the switches of the writer. That would kill performance and not synchronizing would cause data corruption (probably we’d end up with a file with unexpected blocks of zeros preceding the range of the writer that comes second and writes further into the file).

The object of this class is also responsible for backpressure: if items are added to the queue faster than they can be written (see GCP VMs with standard PD storage with faster CPU and network than local storage, https://github.com/Backblaze/B2_Command_Line_Tool/issues/595), then obj.queue.put(item) will block, slowing down the producer.

The recommended minimum value of max_queue_depth is equal to the amount of producer threads, so that if all producers submit a part at the exact same time (right after network issue, for example, or just after starting the read), they can continue their work without blocking. The writer should be able to store at least one data chunk before a new one is retrieved, but it is not guaranteed.

Therefore, the recommended value of max_queue_depth is higher - a double of the amount of producers, so that spikes on either end (many producers submit at the same time / consumer has a latency spike) can be accommodated without sacrificing performance.

Please note that a size of the chunk and the queue depth impact the memory footprint. In a default setting as of writing this, that might be 10 downloads, 8 producers, 1MB buffers, 2 buffers each = 8*2*10 = 160 MB (+ python buffers, operating system etc).

__init__(file, max_queue_depth)[source]

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.

args is a list or tuple of arguments for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

run()[source]

Method representing the thread’s activity.

You may override this method in a subclass. The standard run() method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the args and kwargs arguments, respectively.

b2sdk._internal.transfer.inbound.downloader.parallel.download_first_part(response, hasher, session, writer, first_part, chunk_size, encryption=None)[source]
Parameters:
  • response (Response) – response of the original GET call

  • hasher – hasher object to feed to as the stream is written

  • session (B2Session) – B2 API session

  • writer (WriterThread) – thread responsible for writing downloaded data

  • first_part (PartToDownload) – definition of the part to be downloaded

  • chunk_size (int) – size (in bytes) of read data chunks

  • encryption (Optional[EncryptionSetting]) – encryption mode, algorithm and key

Return type:

None

b2sdk._internal.transfer.inbound.downloader.parallel.download_non_first_part(url, session, writer, part_to_download, chunk_size, encryption=None)[source]
Parameters:
  • url (str) – download URL

  • session (B2Session) – B2 API session

  • writer (WriterThread) – thread responsible for writing downloaded data

  • part_to_download (PartToDownload) – definition of the part to be downloaded

  • chunk_size (int) – size (in bytes) of read data chunks

  • encryption (Optional[EncryptionSetting]) – encryption mode, algorithm and key

Return type:

None

class b2sdk._internal.transfer.inbound.downloader.parallel.PartToDownload(cloud_range, local_range)[source]

Bases: object

Hold the range of a file to download, and the range of the local file where it should be stored.

__init__(cloud_range, local_range)[source]
b2sdk._internal.transfer.inbound.downloader.parallel.gen_parts(cloud_range, local_range, part_count)[source]

Generate a sequence of PartToDownload to download a large file as a collection of parts.