Synchronizer

Synchronizer is a powerful utility with functionality of a basic backup application. It is able to copy entire folders into the cloud and back to a local drive or even between two cloud buckets, providing retention policies and many other options.

The high performance of sync is credited to parallelization of:

  • listing local directory contents

  • listing bucket contents

  • uploads

  • downloads

Synchronizer spawns threads to perform the operations listed above in parallel to shorten the backup window to a minimum.

Sync Options

Following are the important optional arguments that can be provided while initializing Synchronizer class.

  • compare_version_mode: When comparing the source and destination files for finding whether to replace them or not, compare_version_mode can be passed to specify the mode of comparison. For possible values see b2sdk.v1.CompareVersionMode. Default value is b2sdk.v1.CompareVersionMode.MODTIME

  • compare_threshold: It’s the minimum size (in bytes)/modification time (in seconds) difference between source and destination files before we assume that it is new and replace.

  • newer_file_mode: To identify whether to skip or replace if source is older. For possible values see b2sdk.v1.NewerFileSyncMode. If you don’t specify this the sync will raise b2sdk.v1.exception.DestFileNewer in case any of the source file is older than destination.

  • keep_days_or_delete: specify policy to keep or delete older files. For possible values see b2sdk.v1.KeepOrDeleteMode. Default is DO_NOTHING.

  • keep_days: if keep_days_or_delete is b2sdk.v1.CompareVersionMode.KEEP_BEFORE_DELETE then this specify for how many days should we keep.

>>> from b2sdk.v1 import ScanPoliciesManager
>>> from b2sdk.v1 import parse_sync_folder
>>> from b2sdk.v1 import Synchronizer
>>> from b2sdk.v1 import KeepOrDeleteMode, CompareVersionMode, NewerFileSyncMode
>>> import time
>>> import sys

>>> source = '/home/user1/b2_example'
>>> destination = 'b2://example-mybucket-b2'

>>> source = parse_sync_folder(source, b2_api)
>>> destination = parse_sync_folder(destination, b2_api)

>>> policies_manager = ScanPoliciesManager(exclude_all_symlinks=True)

>>> synchronizer = Synchronizer(
        max_workers=10,
        policies_manager=policies_manager,
        dry_run=False,
        allow_empty_source=True,
        compare_version_mode=CompareVersionMode.SIZE,
        compare_threshold=10,
        newer_file_mode=NewerFileSyncMode.REPLACE,
        keep_days_or_delete=KeepOrDeleteMode.KEEP_BEFORE_DELETE,
        keep_days=10,
    )

We have a file (hello.txt) which is present in destination but not on source (my local), so it will be deleted and since our mode is to keep the delete file, it will be hidden for 10 days in bucket.

>>> no_progress = False
>>> with SyncReport(sys.stdout, no_progress) as reporter:
        synchronizer.sync_folders(
            source_folder=source,
            dest_folder=destination,
            now_millis=int(round(time.time() * 1000)),
            reporter=reporter,
        )
upload f1.txt
delete hello.txt (old version)
hide   hello.txt

We changed f1.txt and added 1 byte. Since our compare_threshold is 10, it will not do anything.

>>> with SyncReport(sys.stdout, no_progress) as reporter:
        synchronizer.sync_folders(
            source_folder=source,
            dest_folder=destination,
            now_millis=int(round(time.time() * 1000)),
            reporter=reporter,
        )

We changed f1.txt and added more than 10 bytes. Since our compare_threshold is 10, it will replace the file at destination folder.

>>> with SyncReport(sys.stdout, no_progress) as reporter:
        synchronizer.sync_folders(
            source_folder=source,
            dest_folder=destination,
            now_millis=int(round(time.time() * 1000)),
            reporter=reporter,
        )
upload f1.txt

Let’s just delete the file and not keep - keep_days_or_delete = DELETE You can avoid passing keep_days argument in this case because it will be ignored anyways

>>> synchronizer = Synchronizer(
        max_workers=10,
        policies_manager=policies_manager,
        dry_run=False,
        allow_empty_source=True,
        compare_version_mode=CompareVersionMode.SIZE,
        compare_threshold=10,  # in bytes
        newer_file_mode=NewerFileSyncMode.REPLACE,
        keep_days_or_delete=KeepOrDeleteMode.DELETE,
    )

>>> with SyncReport(sys.stdout, no_progress) as reporter:
    synchronizer.sync_folders(
        source_folder=source,
        dest_folder=destination,
        now_millis=int(round(time.time() * 1000)),
        reporter=reporter,
    )
delete f1.txt
delete f1.txt (old version)
delete hello.txt (old version)
upload f2.txt
delete hello.txt (hide marker)

As you can see, it deleted f1.txt and it’s older versions (no hide this time) and deleted hello.txt also because now we don’t want the file anymore. also, we added another file f2.txt which gets uploaded.

Now we changed newer_file_mode to SKIP and compare_version_mode to MODTIME. also uploaded a new version of f2.txt to bucket using B2 web.

>>> synchronizer = Synchronizer(
        max_workers=10,
        policies_manager=policies_manager,
        dry_run=False,
        allow_empty_source=True,
        compare_version_mode=CompareVersionMode.MODTIME,
        compare_threshold=10,  # in seconds
        newer_file_mode=NewerFileSyncMode.SKIP,
        keep_days_or_delete=KeepOrDeleteMode.DELETE,
    )
>>> with SyncReport(sys.stdout, no_progress) as reporter:
    synchronizer.sync_folders(
        source_folder=source,
        dest_folder=destination,
        now_millis=int(round(time.time() * 1000)),
        reporter=reporter,
    )

As expected, nothing happened, it found a file that was older at source but did not do anything because we skipped.

Now we changed newer_file_mode again to REPLACE and also uploaded a new version of f2.txt to bucket using B2 web.

>>> synchronizer = Synchronizer(
        max_workers=10,
        policies_manager=policies_manager,
        dry_run=False,
        allow_empty_source=True,
        compare_version_mode=CompareVersionMode.MODTIME,
        compare_threshold=10,
        newer_file_mode=NewerFileSyncMode.REPLACE,
        keep_days_or_delete=KeepOrDeleteMode.DELETE,
    )
>>> with SyncReport(sys.stdout, no_progress) as reporter:
    synchronizer.sync_folders(
        source_folder=source,
        dest_folder=destination,
        now_millis=int(round(time.time() * 1000)),
        reporter=reporter,
    )
delete f2.txt (old version)
upload f2.txt

Handling encryption

The Synchronizer object may need EncryptionSetting instances to perform downloads and copies. For this reason, the sync_folder method accepts an EncryptionSettingsProvider, see Server-Side Encryption for further explanation and Sync Encryption Settings Providers for public API.

Public API classes

class b2sdk.v1.ScanPoliciesManager[source]

Policy object used when scanning folders for syncing, used to decide which files to include in the list of files to be synced.

Code that scans through files should at least use should_exclude_file() to decide whether each file should be included; it will check include/exclude patterns for file names, as well as patterns for excluding directories.

Code that scans may optionally use should_exclude_directory() to test whether it can skip a directory completely and not bother listing the files and sub-directories in it.

__init__(exclude_dir_regexes: Iterable[Union[str, re.Pattern]] = (), exclude_file_regexes: Iterable[Union[str, re.Pattern]] = (), include_file_regexes: Iterable[Union[str, re.Pattern]] = (), exclude_all_symlinks: bool = False, exclude_modified_before: Optional[int] = None, exclude_modified_after: Optional[int] = None, exclude_uploaded_before: Optional[int] = None, exclude_uploaded_after: Optional[int] = None)[source]
Parameters
  • exclude_dir_regexes – regexes to exclude directories

  • exclude_file_regexes – regexes to exclude files

  • include_file_regexes – regexes to include files

  • exclude_all_symlinks – if True, exclude all symlinks

  • exclude_modified_before – optionally exclude file versions (both local and b2) modified before (in millis)

  • exclude_modified_after – optionally exclude file versions (both local and b2) modified after (in millis)

  • exclude_uploaded_before – optionally exclude b2 file versions uploaded before (in millis)

  • exclude_uploaded_after – optionally exclude b2 file versions uploaded after (in millis)

The regex matching priority for a given path is: 1) the path is always excluded if it’s dir matches exclude_dir_regexes, if not then 2) the path is always included if it matches include_file_regexes, if not then 3) the path is excluded if it matches exclude_file_regexes, if not then 4) the path is included

should_exclude_file(file_path)[source]

Given the full path of a file, decide if it should be excluded from the scan.

Parameters

file_path – the path of the file, relative to the root directory being scanned.

Type

str

Returns

True if excluded.

Return type

bool

should_exclude_file_version(file_version)[source]

Given the modification time of a file version, decide if it should be excluded from the scan.

Parameters

file_version – the file version object

Type

b2sdk.v1.FileVersion

Returns

True if excluded.

Return type

bool

should_exclude_directory(dir_path)[source]

Given the full path of a directory, decide if all of the files in it should be excluded from the scan.

Parameters

dir_path (str) – the path of the directory, relative to the root directory being scanned. The path will never end in ‘/’.

Returns

True if excluded.

should_exclude_b2_directory(dir_path: str)[source]

Given the path of a directory, relative to the sync point, decide if all of the files in it should be excluded from the scan.

should_exclude_b2_file_version(file_version: b2sdk.file_version.FileVersion, relative_path: str)[source]

Whether a b2 file version should be excluded from the Sync or not.

This method assumes that the directory holding the path_ has already been checked for exclusion.

should_exclude_local_directory(dir_path: str)[source]

Given the path of a directory, relative to the sync point, decide if all of the files in it should be excluded from the scan.

should_exclude_local_path(local_path: b2sdk.sync.path.LocalSyncPath)[source]

Whether a local path should be excluded from the Sync or not.

This method assumes that the directory holding the path_ has already been checked for exclusion.

class b2sdk.v1.Synchronizer[source]
__init__(max_workers, policies_manager=<b2sdk.v1.sync.scan_policies.ScanPoliciesManager object>, dry_run=False, allow_empty_source=False, newer_file_mode=<NewerFileSyncMode.RAISE_ERROR: 103>, keep_days_or_delete=<KeepOrDeleteMode.NO_DELETE: 303>, compare_version_mode=<CompareVersionMode.MODTIME: 201>, compare_threshold=None, keep_days=None)[source]

Initialize synchronizer class and validate arguments

Parameters
  • max_workers (int) – max number of workers

  • policies_manager – policies manager object

  • dry_run (bool) – test mode, does not actually transfer/delete when enabled

  • allow_empty_source (bool) – if True, do not check whether source folder is empty

  • newer_file_mode (b2sdk.v1.NewerFileSyncMode) – setting which determines handling for destination files newer than on the source

  • keep_days_or_delete (b2sdk.v1.KeepOrDeleteMode) – setting which determines if we should delete or not delete or keep for keep_days

  • compare_version_mode (b2sdk.v1.CompareVersionMode) – how to compare the source and destination files to find new ones

  • compare_threshold (int) – should be greater than 0, default is 0

  • keep_days (int) – if keep_days_or_delete is b2sdk.v1.KeepOrDeleteMode.KEEP_BEFORE_DELETE, then this should be greater than 0

make_folder_sync_actions(source_folder, dest_folder, now_millis, reporter, policies_manager=<b2sdk.v1.sync.scan_policies.ScanPoliciesManager object>, encryption_settings_provider=<b2sdk.sync.encryption_provider.ServerDefaultSyncEncryptionSettingsProvider object>)[source]
make_file_sync_actions(sync_type, source_file, dest_file, source_folder, dest_folder, now_millis, encryption_settings_provider: b2sdk.sync.encryption_provider.AbstractSyncEncryptionSettingsProvider = <b2sdk.sync.encryption_provider.ServerDefaultSyncEncryptionSettingsProvider object>)[source]

Yields the sequence of actions needed to sync the two files

Parameters
  • sync_type (str) – synchronization type

  • source_file (b2sdk.v1.File) – source file object

  • dest_file (b2sdk.v1.File) – destination file object

  • source_folder (b2sdk.v1.AbstractFolder) – a source folder object

  • dest_folder (b2sdk.v1.AbstractFolder) – a destination folder object

  • now_millis (int) – current time in milliseconds

  • encryption_settings_provider (b2sdk.v1.AbstractSyncEncryptionSettingsProvider) – encryption setting provider

sync_folders(source_folder, dest_folder, now_millis, reporter, encryption_settings_provider: b2sdk.sync.encryption_provider.AbstractSyncEncryptionSettingsProvider = <b2sdk.sync.encryption_provider.ServerDefaultSyncEncryptionSettingsProvider object>)[source]

Syncs two folders. Always ensures that every file in the source is also in the destination. Deletes any file versions in the destination older than history_days.

Parameters
class b2sdk.v1.SyncReport[source]

Handle reporting progress for syncing.

Print out each file as it is processed, and puts up a sequence of progress bars.

The progress bars are:
  • Step 1/1: count local files

  • Step 2/2: compare file lists

  • Step 3/3: transfer files

This class is THREAD SAFE, so it can be used from parallel sync threads.

UPDATE_INTERVAL = 0.1
__init__(stdout, no_progress)[source]
Parameters
  • stdout – standard output file object

  • no_progress (bool) – if True, do not show progress

close()[source]

Perform a clean-up.

error(message)[source]

Print an error, gracefully interleaving it with a progress bar.

Parameters

message (str) – an error message

print_completion(message)[source]

Remove the progress bar, prints a message, and puts the progress bar back.

Parameters

message (str) – an error message

update_total(delta)[source]

Report that more files have been found for comparison.

Parameters

delta (int) – number of files found since the last check

end_total()[source]

Total files count is done. Can proceed to step 2.

update_compare(delta)[source]

Report that more files have been compared.

Parameters

delta (int) – number of files compared

end_compare(total_transfer_files, total_transfer_bytes)[source]

Report that the comparison has been finished.

Parameters
  • total_transfer_files (int) – total number of transferred files

  • total_transfer_bytes (int) – total number of transferred bytes

update_transfer(file_delta, byte_delta)[source]

Update transfer info.

Parameters
  • file_delta (int) – number of files transferred

  • byte_delta (int) – number of bytes transferred

local_access_error(path)[source]

Add a file access error message to the list of warnings.

Parameters

path (str) – file path

local_permission_error(path)[source]

Add a permission error message to the list of warnings.

Parameters

path (str) – file path

property local_file_count
property local_done
update_local(delta)

Report that more files have been found for comparison.

Parameters

delta (int) – number of files found since the last check

end_local()

Total files count is done. Can proceed to step 2.

Sync Encryption Settings Providers

class b2sdk.v1.AbstractSyncEncryptionSettingsProvider[source]
abstract get_setting_for_upload(bucket: b2sdk.v1.bucket.Bucket, b2_file_name: str, file_info: Optional[dict], length: int)Optional[b2sdk.encryption.setting.EncryptionSetting][source]

Return an EncryptionSetting for uploading an object or None if server should decide.

abstract get_source_setting_for_copy(bucket: b2sdk.v1.bucket.Bucket, source_file_version_info: b2sdk.v1.file_version.FileVersionInfo)Optional[b2sdk.encryption.setting.EncryptionSetting][source]

Return an EncryptionSetting for a source of copying an object or None if not required

abstract get_destination_setting_for_copy(bucket: b2sdk.v1.bucket.Bucket, dest_b2_file_name: str, source_file_version_info: b2sdk.v1.file_version.FileVersionInfo, target_file_info: Optional[dict] = None)Optional[b2sdk.encryption.setting.EncryptionSetting][source]

Return an EncryptionSetting for a destination for copying an object or None if server should decide

abstract get_setting_for_download(bucket: b2sdk.v1.bucket.Bucket, file_version_info: b2sdk.v1.file_version.FileVersionInfo)Optional[b2sdk.encryption.setting.EncryptionSetting][source]

Return an EncryptionSetting for downloading an object from, or None if not required

class b2sdk.v1.ServerDefaultSyncEncryptionSettingsProvider[source]

Encryption settings provider which assumes setting-less reads and a bucket default for writes.

class b2sdk.v1.BasicSyncEncryptionSettingsProvider[source]

Basic encryption setting provider that supports exactly one encryption setting per bucket for reading and one encryption setting per bucket for writing

__init__(read_bucket_settings: Dict[str, Optional[b2sdk.encryption.setting.EncryptionSetting]], write_bucket_settings: Dict[str, Optional[b2sdk.encryption.setting.EncryptionSetting]])[source]

Initialize self. See help(type(self)) for accurate signature.