1 Introduction
This document provides a rather simple example of the use of the Dropbox Python SDK to upload files to a dropbox folder. We'll provide examples of both a synchronous and sequential process and one that follows an asynchronous and coroutine model using gevent.
You can find information on gevent here:
And, documentation on the Dropbox Python SDK is here: http://dropbox-sdk-python.readthedocs.io/en/master/.
Both of our example programs read a file that contains specifications of the files to be uploaded. Each line in this file contains the name of the file to be uploaded and the "target", i.e. the location for the file in the Dropbox folder (a path/name). Here is an example that I used in my testing:
Data/tmp01.txt /Target/tmp01.txt Data/tmp02.txt /Target/tmp02.txt Data/tmp03.txt /Target/tmp03.txt Data/tmp04.txt /Target/tmp04.txt Data/tmp05.txt /Target/tmp05.txt Data/tmp06.txt /Target/tmp06.txt Data/tmp07.txt /Target/tmp07.txt Data/tmp08.txt /Target/tmp08.txt Data/tmp09.txt /Target/tmp09.txt Data/tmp10.txt /Target/tmp10.txt Data/tmp11.txt /Target/tmp11.txt Data/tmp12.txt /Target/tmp12.txt
Notes:
- Data/tmp01.txt, Data/tmp02.txt, etc. are the files to be uploaded. They exist and are small text files on my local file system.
- /Target/tmp01.txt, /Target/tmp02.txt, etc. are the locations that the corresponding file will be uploaded to in my Dropbox folder. This is where the files will appear in my Dropbox folder.
I tested these examples under both Python 2 and Python 3.
2 The sequential model
For comparison, this example does not use gevent.
This example uses a simple for loop to repeatedly call a function that uploads one file. Here is the source:
#!/usr/bin/env python """ synopsis: Upload files to Dropbox folder. Attempt to upload files in parallel. Read names of files to be uploaded and the path at which each is to be stored from spec_file. spec_file contains one line per file with two fields per line: the name and the path. usage: python tornado_ioloop01.py <spec_file> """ from __future__ import print_function import sys import os import dropbox import time import datetime Auth_key = "<my-dropbox-authentication-key>" def read_files_and_paths(infilename): with open(infilename, 'r') as infile: specs = [] for line in infile: line = line.strip() if not line.startswith('#'): source, dest = line.split() specs.append((source, dest)) return specs def upload_one_file(dbx, source, dest): overwrite = False with open(source, 'r') as infile: data = infile.read() if sys.version_info.major == 3: bytesdata = bytes(data, 'utf-8') else: bytesdata = data mode = ( dropbox.files.WriteMode.overwrite if overwrite else dropbox.files.WriteMode.add) mtime = os.path.getmtime(source) client_modified = datetime.datetime(*time.gmtime(mtime)[:6]) print('bytesdata: {} dest: {} client_modified: {}'.format( bytesdata, dest, client_modified)) res = dbx.files_upload( bytesdata, dest, mode, client_modified=client_modified, mute=True) print('res.name: {} source: {} dest: {}'.format( res.name, source, dest)) def upload_files_seq(dbx, files_and_paths): for source, dest in files_and_paths: upload_one_file(dbx, source, dest) def main(): args = sys.argv[1:] if len(args) != 1: sys.exit(__doc__) infilename = args[0] files_and_paths = read_files_and_paths(infilename) print('files_and_paths: {}'.format(files_and_paths)) dbx = dropbox.Dropbox(Auth_key) upload_files_seq(dbx, files_and_paths) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
3 The async model
Here is the version that uses gevent to batch those requests for Dropbox to upload:
#!/usr/bin/env python """ synopsis: Upload files to Dropbox folder. Attempt to upload files in parallel. Read names of files to be uploaded and the path at which each is to be stored from spec_file. spec_file contains one line per file with two fields per line: the name and the path. usage: python tornado_ioloop01.py <spec_file> """ from __future__ import print_function import gevent.monkey gevent.monkey.patch_all() import sys import os import gevent import dropbox import time import datetime Auth_key = "<my-dropbox-authentication-key>" def read_files_and_paths(infilename): with open(infilename, 'r') as infile: specs = [] for line in infile: line = line.strip() if not line.startswith('#'): source, dest = line.split() specs.append((source, dest)) return specs def upload_one_file(dbx, source, dest): overwrite = False with open(source, 'r') as infile: data = infile.read() if sys.version_info.major == 3: bytesdata = bytes(data, 'utf-8') else: bytesdata = data mode = ( dropbox.files.WriteMode.overwrite if overwrite else dropbox.files.WriteMode.add) mtime = os.path.getmtime(source) client_modified = datetime.datetime(*time.gmtime(mtime)[:6]) print('bytesdata: {} dest: {} client_modified: {}'.format( bytesdata, dest, client_modified)) res = dbx.files_upload( bytesdata, dest, mode, client_modified=client_modified, mute=True) print('res.name: {} source: {} dest: {}'.format( res.name, source, dest)) def upload_files(dbx, files_and_paths): threads = [] for source, dest in files_and_paths: threads.append(gevent.spawn(upload_one_file, dbx, source, dest)) gevent.joinall(threads) def main(): args = sys.argv[1:] if len(args) != 1: sys.exit(__doc__) infilename = args[0] files_and_paths = read_files_and_paths(infilename) print('files_and_paths: {}'.format(files_and_paths)) dbx = dropbox.Dropbox(Auth_key) upload_files(dbx, files_and_paths) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Notes:
- Rather than use a for loop to directly call our function that uploads one file, this for loop spawns and collects a list of Greenlet pseudo threads, each of which encapsulate a call to the function that uploads one file.
- Then we wait for these tasks to complete using gevent.joinall(threads).
- Note the call to gevent.monkey.patch_all(). That is what is what changes some of the calls down inside the Dropbox SDK from blocking to non-blocking calls and enables our pseudo-threads to give up control to a second thread while the first thread waits on an I/O or network request. That's what enables gevent to schedule the running of Greenlets "cooperatively".
4 The async model with limited concurrency
The above example works fine with a limited number of files. But, what if we attempted to upload a large number of files? In that case we might want to put some limit on the number of tasks that can be active concurrently. gevent makes it rather easy to do that, too.
The gevent.pool.Pool class enables us to specify a maximum number of Greenlets to be active at any time.
Using a pool can be as simple as making a couple of modifications to the above gevent example. Here is a diff between the previous gevent example and one that uses gevent.pool.Pool instead:
--- upload_batch02.py 2017-01-10 09:59:05.434850755 -0800 +++ upload_batch03.py 2017-01-10 12:09:06.509149609 -0800 @@ -19,6 +19,7 @@ import sys import os import gevent +import gevent.pool import dropbox import time import datetime @@ -63,9 +64,10 @@ def upload_files(dbx, files_and_paths): + pool = gevent.pool.Pool(4) threads = [] for source, dest in files_and_paths: - threads.append(gevent.spawn(upload_one_file, dbx, source, dest)) + threads.append(pool.spawn(upload_one_file, dbx, source, dest)) gevent.joinall(threads)
Notes:
- We added an import for the gevent.pool module.
- We create an instance of the gevent.pool.Pool class, specifying that the maximum size of the pool is 4.
- We spawn our threads (Greenlets) using pool.spawn instead of gevent.spawn.