Synchronizer¶
Synchronizer is a powerful utility with functionality of a basic backup application. It is able to copy entire folders into the cloud and back to a local drive, providing retention policies and many other options.
The high performance of sync is credited to parallelization of:
listing local directory contents
listing bucket contents
uploads
downloads
Synchronizer spawns threads to perform the operations listed above in parallel to shorten the backup window to a minimum.
Sync Options¶
Following are the important optional arguments that can be provided while initializing Synchronizer class.
compare_version_mode
: When comparing the source and destination files for finding whether to replace them or not, compare_version_mode can be passed to specify the mode of comparision. For possible values seeb2sdk.v1.CompareVersionMode
. Default value isb2sdk.v1.CompareVersionMode.MODTIME
compare_threshold
: It’s the minimum size (in bytes)/modification time (in seconds) difference between source and destination files before we assume that it is new and replace.newer_file_mode
: To identify whether to skip or replace if source is older. For possible values seeb2sdk.v1.NewerFileSyncMode
. If you don’t specify this the sync will raiseb2sdk.v1.exception.DestFileNewer
in case any of the source file is older than destination.keep_days_or_delete
: specify policy to keep or delete older files. For possible values seeb2sdk.v1.KeepOrDeleteMode
. Default is DO_NOTHING.keep_days
: if keep_days_or_delete isb2sdk.v1.CompareVersionMode.KEEP_BEFORE_DELETE
then this specify for how many days should we keep.
>>> from b2sdk.v1 import ScanPoliciesManager
>>> from b2sdk.v1 import parse_sync_folder
>>> from b2sdk.v1 import Synchronizer
>>> from b2sdk.v1 import KeepOrDeleteMode, CompareVersionMode, NewerFileSyncMode
>>> import time
>>> import sys
>>> source = '/home/user1/b2_example'
>>> destination = 'b2://example-mybucket-b2'
>>> source = parse_sync_folder(source, b2_api)
>>> destination = parse_sync_folder(destination, b2_api)
>>> policies_manager = ScanPoliciesManager(exclude_all_symlinks=True)
>>> synchronizer = Synchronizer(
max_workers=10,
policies_manager=policies_manager,
dry_run=False,
allow_empty_source=True,
compare_version_mode=CompareVersionMode.SIZE,
compare_threshold=10,
newer_file_mode=NewerFileSyncMode.REPLACE,
keep_days_or_delete=KeepOrDeleteMode.KEEP_BEFORE_DELETE,
keep_days=10,
)
We have a file (hello.txt) which is present in destination but not on source (my local), so it will be deleted and since our mode is to keep the delete file, it will be hidden for 10 days in bucket.
>>> no_progress = False
>>> with SyncReport(sys.stdout, no_progress) as reporter:
synchronizer.sync_folders(
source_folder=source,
dest_folder=destination,
now_millis=int(round(time.time() * 1000)),
reporter=reporter,
)
upload f1.txt
delete hello.txt (old version)
hide hello.txt
We changed f1.txt and added 1 byte. Since our compare_threshold is 10, it will not do anything.
>>> with SyncReport(sys.stdout, no_progress) as reporter:
synchronizer.sync_folders(
source_folder=source,
dest_folder=destination,
now_millis=int(round(time.time() * 1000)),
reporter=reporter,
)
We changed f1.txt and added more than 10 bytes. Since our compare_threshold is 10, it will replace the file at destination folder.
>>> with SyncReport(sys.stdout, no_progress) as reporter:
synchronizer.sync_folders(
source_folder=source,
dest_folder=destination,
now_millis=int(round(time.time() * 1000)),
reporter=reporter,
)
upload f1.txt
Let’s just delete the file and not keep - keep_days_or_delete = DELETE You can avoid passing keep_days argument in this case because it will be ignored anyways
>>> synchronizer = Synchronizer(
max_workers=10,
policies_manager=policies_manager,
dry_run=False,
allow_empty_source=True,
compare_version_mode=CompareVersionMode.SIZE,
compare_threshold=10, # in bytes
newer_file_mode=NewerFileSyncMode.REPLACE,
keep_days_or_delete=KeepOrDeleteMode.DELETE,
)
>>> with SyncReport(sys.stdout, no_progress) as reporter:
synchronizer.sync_folders(
source_folder=source,
dest_folder=destination,
now_millis=int(round(time.time() * 1000)),
reporter=reporter,
)
delete f1.txt
delete f1.txt (old version)
delete hello.txt (old version)
upload f2.txt
delete hello.txt (hide marker)
As you can see, it deleted f1.txt and it’s older versions (no hide this time) and deleted hello.txt also because now we don’t want the file anymore. also, we added another file f2.txt which gets uploaded.
Now we changed newer_file_mode to SKIP and compare_version_mode to MODTIME. also uploaded a new version of f2.txt to bucket using B2 web.
>>> synchronizer = Synchronizer(
max_workers=10,
policies_manager=policies_manager,
dry_run=False,
allow_empty_source=True,
compare_version_mode=CompareVersionMode.MODTIME,
compare_threshold=10, # in seconds
newer_file_mode=NewerFileSyncMode.SKIP,
keep_days_or_delete=KeepOrDeleteMode.DELETE,
)
>>> with SyncReport(sys.stdout, no_progress) as reporter:
synchronizer.sync_folders(
source_folder=source,
dest_folder=destination,
now_millis=int(round(time.time() * 1000)),
reporter=reporter,
)
As expected, nothing happened, it found a file that was older at source but did not do anything because we skipped.
Now we changed newer_file_mode again to REPLACE and also uploaded a new version of f2.txt to bucket using B2 web.
>>> synchronizer = Synchronizer(
max_workers=10,
policies_manager=policies_manager,
dry_run=False,
allow_empty_source=True,
compare_version_mode=CompareVersionMode.MODTIME,
compare_threshold=10,
newer_file_mode=NewerFileSyncMode.REPLACE,
keep_days_or_delete=KeepOrDeleteMode.DELETE,
)
>>> with SyncReport(sys.stdout, no_progress) as reporter:
synchronizer.sync_folders(
source_folder=source,
dest_folder=destination,
now_millis=int(round(time.time() * 1000)),
reporter=reporter,
)
delete f2.txt (old version)
upload f2.txt
-
class
b2sdk.v1.
ScanPoliciesManager
[source]¶ Policy object used when scanning folders for syncing, used to decide which files to include in the list of files to be synced.
Code that scans through files should at least use should_exclude_file() to decide whether each file should be included; it will check include/exclude patterns for file names, as well as patterns for excluding directories.
Code that scans may optionally use should_exclude_directory() to test whether it can skip a directory completely and not bother listing the files and sub-directories in it.
-
__init__
(exclude_dir_regexes=(), exclude_file_regexes=(), include_file_regexes=(), exclude_all_symlinks=False, exclude_modified_before=None, exclude_modified_after=None)[source]¶ - Parameters
exclude_dir_regexes (tuple) – a tuple of regexes to exclude directories
exclude_file_regexes (tuple) – a tuple of regexes to exclude files
include_file_regexes (tuple) – a tuple of regexes to include files
exclude_all_symlinks (bool) – if True, exclude all symlinks
exclude_modified_before (int, optional) – optionally exclude file versions modified before (in millis)
exclude_modified_after (int, optional) – optionally exclude file versions modified after (in millis)
-
should_exclude_file
(file_path)[source]¶ Given the full path of a file, decide if it should be excluded from the scan.
-
should_exclude_file_version
(file_version)[source]¶ Given the modification time of a file version, decide if it should be excluded from the scan.
- Parameters
file_version – the file version object
- Type
b2sdk.v1.FileVersion
- Returns
True if excluded.
- Return type
-
should_exclude_directory
(dir_path)[source]¶ Given the full path of a directory, decide if all of the files in it should be excluded from the scan.
- Parameters
dir_path (str) – the path of the directory, relative to the root directory being scanned. The path will never end in ‘/’.
- Returns
True if excluded.
-
-
class
b2sdk.v1.
Synchronizer
[source]¶ -
__init__
(max_workers, policies_manager=<b2sdk.sync.scan_policies.ScanPoliciesManager object>, dry_run=False, allow_empty_source=False, newer_file_mode=<NewerFileSyncMode.RAISE_ERROR: 103>, keep_days_or_delete=<KeepOrDeleteMode.NO_DELETE: 303>, compare_version_mode=<CompareVersionMode.MODTIME: 201>, compare_threshold=None, keep_days=None)[source]¶ Initialize synchronizer class and validate arguments
- Parameters
max_workers (int) – max number of workers
policies_manager – policies manager object
dry_run (bool) – test mode, does not actually transfer/delete when enabled
allow_empty_source (bool) – if True, do not check whether source folder is empty
newer_file_mode (b2sdk.v1.NewerFileSyncMode) – setting which determines handling for destination files newer than on the source
keep_days_or_delete (b2sdk.v1.KeepOrDeleteMode) – setting which determines if we should delete or not delete or keep for keep_days
compare_version_mode (b2sdk.v1.CompareVersionMode) – how to compare the source and destination files to find new ones
compare_threshold (int) – should be greater than 0, default is 0
keep_days (int) – if keep_days_or_delete is b2sdk.v1.KeepOrDeleteMode.KEEP_BEFORE_DELETE, then this should be greater than 0
-
sync_folders
(source_folder, dest_folder, now_millis, reporter)[source]¶ Syncs two folders. Always ensures that every file in the source is also in the destination. Deletes any file versions in the destination older than history_days.
- Parameters
source_folder (b2sdk.sync.folder.AbstractFolder) – source folder object
dest_folder (b2sdk.sync.folder.AbstractFolder) – destination folder object
now_millis (int) – current time in milliseconds
reporter (b2sdk.sync.report.SyncReport,None) – progress reporter
-
make_folder_sync_actions
(source_folder, dest_folder, now_millis, reporter, policies_manager=<b2sdk.sync.scan_policies.ScanPoliciesManager object>)[source]¶ Yield a sequence of actions that will sync the destination folder to the source folder.
- Parameters
source_folder (b2sdk.v1.AbstractFolder) – source folder object
dest_folder (b2sdk.v1.AbstractFolder) – destination folder object
now_millis (int) – current time in milliseconds
reporter (b2sdk.v1.SyncReport) – reporter object
policies_manager – policies manager object
-
make_file_sync_actions
(sync_type, source_file, dest_file, source_folder, dest_folder, now_millis)[source]¶ Yields the sequence of actions needed to sync the two files
- Parameters
sync_type (str) – synchronization type
source_file (b2sdk.v1.File) – source file object
dest_file (b2sdk.v1.File) – destination file object
source_folder (b2sdk.v1.AbstractFolder) – a source folder object
dest_folder (b2sdk.v1.AbstractFolder) – a destination folder object
now_millis (int) – current time in milliseconds
-
-
class
b2sdk.v1.
SyncReport
[source]¶ Handle reporting progress for syncing.
Print out each file as it is processed, and puts up a sequence of progress bars.
- The progress bars are:
Step 1/1: count local files
Step 2/2: compare file lists
Step 3/3: transfer files
This class is THREAD SAFE, so it can be used from parallel sync threads.
-
UPDATE_INTERVAL
= 0.1¶
-
__init__
(stdout, no_progress)[source]¶ - Parameters
stdout – standard output file object
no_progress (bool) – if True, do not show progress
-
error
(message)[source]¶ Print an error, gracefully interleaving it with a progress bar.
- Parameters
message (str) – an error message
-
print_completion
(message)[source]¶ Remove the progress bar, prints a message, and puts the progress bar back.
- Parameters
message (str) – an error message
-
update_local
(delta)[source]¶ Report that more local files have been found.
- Parameters
delta (int) – number of files found since the last check
-
update_compare
(delta)[source]¶ Report that more files have been compared.
- Parameters
delta (int) – number of files compared
-
end_compare
(total_transfer_files, total_transfer_bytes)[source]¶ Report that the comparison has been finished.
-
local_access_error
(path)[source]¶ Add a file access error message to the list of warnings.
- Parameters
path (str) – file path