10.2.5. Data mirroring
When mirroring is enabled a local storage folder is created for every spawned task. The task is then
started in this folder. The location where this folder is created is determined by the
PARALLEL_LOCAL_STORAGE
environmental variable. If the variable is not set a default location is
used (i.e. <temporary_folder>/pyopus).
The temporary folder is populated with files that are specified VM object construction using the
mirrorMap parameter. The copying is performed via shared folders (mounted filesystems). The
PARALLEL_MIRRORED_STORAGE
environmental variable specifies a colon separated list of paths to shared
folders in the same order on all hosts in the cluster. If the variable is not specified the user’s
home folder is assumed to be shared across all hosts.
After the spawned process is finished the temporary folder is deleted. If the persistentStorage
parameter is set to True
at VM object construction the storage is not deleted when the task is
finished and the next time a new task is spawned the same folder is reused. Because temporary folders
are named after the process ID of the client that executed spawn requests these folders must be
deleted when the machine is started so that a temporary folder created before the last booting of
the machine is not reused along with its outdated contents.
The process ID is an integer and can theoretically wrap around if the system is up for a sufficient amount of time. In practice, however, this happens rarely because process IDs are 32-bit integers or even longer.
File 04-mirror.py in folder demo/parallel/vm/
# Demonstrates file mirroring
import sys
from pyopus.parallel.mpi import MPI as VM
import funclib
import os
if __name__=='__main__':
# Startup dir must contain funclib so we can import it on a worker
# (funclib is not in PYTHONPATH).
# Mirror current dir on spawner to workers local storage.
# Startupdir is by default the created local storage dir.
vm=VM(mirrorMap={'*':'.'}, debug=2)
# Spawn 1 task anywhere, send vm as argument with name 'vm'.
# The spawned function must be defined in an importable module outside main .py file.
# Print some status information and local storage layout.
print("\nSpawning task.")
taskIDs=vm.spawnFunction(funclib.helloLs, kwargs={'vm': vm}, count=1)
print(taskIDs)
print("Spawned: "+str(taskIDs[0]))
print("Collecting stdout ...")
# Wait for a message, e.g. TaskExit
vm.receiveMessage()
vm.finalize()
The helloLs()
function is defined in the funclib
module (file funclib.py
in folder demo/parallel/vm/).
This function prints the contents of the currect folder.
def helloLs(vm=None):
print(hello(vm))
# Print current directory contents.
contents=os.listdir('.')
dirs=[]
files=[]
for entry in contents:
if os.path.isdir(entry):
dirs.append(entry)
else:
files.append(entry)
print("Dirs : "+str(dirs))
print("Files : "+str(files))
sys.stdout.flush()