Base classes for virtual machines (PyOPUS subsystem name: VM)
A spawner task is a task that can spawn new tasks on hosts in the virtual machine. All other tasks are worker tasks.
Mirroring and local storage
Often tasks in the virtual machine require the use of additional storage (like a harddisk) for storing the parallel algorithm’s input files which are identical for all hosts. These files must therefore be distributed to all hosts in a virtual machine before the computation can begin.
One way to make all virtual machines see the same directory structure is to use network filesystems like NFS or SMBFS. In this approach the storage physically resides one a computer in the network (file server). This storage is exported and mounted by all hosts in the virtual machine.
Take for instance that folder /shareme/foo on the file server is exported. Hosts in the virtual machine mount this folder under /home. This way all hosts see the /shareme/foo folder located on the physical storage of the server as their own folder /home.
This approach is very simple and ensures that all hosts see the same input files in the same place. Unfortunately as the number of hosts grows and the amount of data read operatins to the shared folder grows, the network quickly becomes saturated and the computational performance of the virtual machine dramatically decreases because tasks must wait on slow data read operations.
A more scalable solution is to use the local storage of every host to store the algoruthm’s input files. This has the downside that before the computation begins these files must be distributed to all hosts.
On the other hand parallel algorithms often require some local storage for storing various intermediate files. If this local storage is in the form of a shared folder an additional problem occurs when multiple hosts try to write a file with the same name to a physically same place, but with different content.
A solution to the intermediate file problem is to use every host’s phisically local storage (e.g. its local harddisk) for storing the intermediate files.
The solution to these problems in PyOPUS is to use mirroring. Mirroring is configured through two environmental variables. PARALLEL_LOCAL_STORAGE specifies the path to the folder where the folders for the local storage of input and intermediate files will be created. The PARALLEL_MIRRORED_STORAGE environmental variable specifies a colon (:) separated list of paths to the directories that are mounted from a common file server. The first such path in PARALLEL_MIRRORED_STORAGE on all hosts corresponds to the same directory on the file server. The same goes for the second, the third, etc. paths listed in PARALLEL_MIRRORED_STORAGE.
The PARALLEL_LOCAL_STORAGE and the PARALLEL_MIRRORED_STORAGE environmental variables must be set on all hosts in a virtual machine. It is usually set to /localhome/USERNAME which must physically reside on the local machine. PARALLEL_MIRRORED_STORAGE should be writeable by the user that is running the spawned processes.
Both PARALLEL_LOCAL_STORAGE and PARALLEL_MIRRORED_STORAGE can use UNIX style user home directory expansion (e.g. ~user expands to /home/user).
Path translation in mirroring operations
Suppose the PARALLEL_MIRRORED_STORAGE environmental variable is set to /foo:/bar on host1 and /d1:/d2 on host2. This means that the following directories are equivalent (mounted from the same exported folder on a file server)
host1 host2 /foo /d1 /bar /d2
So /foo on host1 represents the same physical storage as /d1 on host2. Similarly /bar on host1 represents the same physical storage as /d2 on host2. Usually only /home is common to all hosts in a virtual machine so PARALLEL_MIRRORED_STORAGE is set to /home on all hosts.
Path translation converts a path that on host1 into a path to the physically same filesystem object (mounted from the same exported directory on the file server) on host2. The following is an example of path translation.
Path on host1 Path on host2 /foo/a /d1/a /bar/b /d2/b
Basic task identifier class that is used for identifying a task in a virtual machine. TaskID objects must support comparison (__cmp__()), hashing (__hash__()), and conversion to a string (__str__()).
Basic host identifier class that is used for identifying a host in a virtual machine. HostID objects must support comparison (__cmp__()), hashing (__hash__()), and conversion to a string (__str__()).
This is message that signals a task has exited.
The TaskID object corresponding to the task is stored in the taskID member.
This is message that signals a host has exited from the virtual machine.
The HostID object corresponding to the host is stored in the hostID member.
This is message that signals new hosts were added to the virtual machine.
The list of HostID objects corresponding to the added hosts is stored in the hostIDs member.
This is message that is sent to the process that spawned a Python function in a virtual machine. The message holds a boolean flag that tells if the function succeeded to run and the return value of the function.
The boolean flag and the return value can be found in success and returnValue members.
The base class for accessing hosts working in parallel.
debug specifies the debug level. If it is greater than 0 debug messages are printed on the standard output. If importUser is set to True the user module is imported on a remote host before a remote task is spawned on it.
This function spawns the localStorageCleaner() function on all slots in the virtual machine. The spawned instances remove the local storage that was created for the slot.
timeout is applied where needed. Negative values stand for infinite timeout.
This function should never be called if there are tasks running in the virtual machine because it will remove their local storage.
This function should be called only by the spawner.
Creates a local storage directory subtree given by subpath under PARALLEL_LOCAL_STORAGE.
If a local storage directory with the same name already exists it is suffixed by an underscore and a hexadecimal number.
Returns the path to the created local storage directory.
Prepares the working environment (working directory and local storage) for a spawned function. This method is called by a spawned task. The mirroring information is received from the spawner (spawner’s mirrorList) at function spawn time. Spawned task’s virtual machine object is namely the spawner’s virtual machine object sent to the spawned task.
If mirroring is not configured with the setSpawn() method, the working directory is the one specified as startupDir at the last call to setSpawn(). If it is None the working directory is determined by the undelying virtual machine library (e.g. PVM).
If mirroring is configured with setSpawn() (mirrorList is not None), a local storage directory is created by calling createLocalStorage() with subpath set to the PID of the calling process in hexadecimal notation.
Next mirroring is performed by traversing all members of the processed mirrorList dictionary received from the spawner which is a list of tuples of the form (index, suffix, target) where index and suffix specify the source filesystem object to mirror (see the translateToAbstractPath() method for the explanation of index and suffix) while target is the destination where the object will be copied.
The source can be specified with globbing characters (anything the glob.glob() function can handle is OK).
target is the path relative to the local storage directory where the source will be copied. Renaming of the source is not possible. target always specifies the destination directory.
If source is a directory, symbolic links within it are copied as symbolic links which means that they should be relative and point to mirrored filesystem objects in order to remain valid after mirroring.
If startupDir was configured with setSpawn() and it is not None the working directory is set to a subpath in the local storage directory specified by startupDir.
If startupDir is None the working directory is set to the local storage directory.
Returns the path to the local storage dirextory.
Receives a message (a Python object) and returns a tuple (senderTaskId, message)
The sender of the message can be identified through the senderTaskId object of class TaskID.
If timeout is negative the function waits (blocks) until some message arrives. If timeout>0 seconds pass without receiving a message, an empty tuple is returned. Zero timeout performs a nonblocking receive which returns an empty tuple if no message is received.
In case of an error the return value is None.
Configures task spawning on the spawning process (spawner). startupDir specifies the working directory where the spawned functions will wake up.
mirrorMap is a dictionary specifying filesystem objects (files and directories) on the spawner which are to be mirrored (copied) to local storage on the host where the task is spawned. The keys represent paths on the spawner while the values represent the corresponding paths (relative to the local storage dorectory) on the host where the task will be spawned. Keys can une UNIX style globbing (anything the glob.glob() function can handle is OK). If mirrorMap is set to None, no mirroring is performed.
For the mirroring to work the filesystem objects on the spawner must be in the folders specified in the PARALLEL_MIRRORED_STORAGE environmental variable. This is because mirroring is performed by local copy operations which require the source to be on a mounted network filesystem.
The settings set by setSpawn are valid for all spawn operations until the next call to setSpawn(). The initial values of startupDir and mirrorMap are None.
To find out more about setting the working directory and mirroring see the prepareEnvironment() method.
Spawns a count instances of a Python function on remote hosts and passes args and kwargs to the function. Spawning a function actually means to start a Python interpreter, import the function, and call it with args and kwargs.
If count is None the number of tasks is select in such way that all available slots are filled.
function, args, and kwargs must be pickleable.
targetList specifies a list of hosts on which the function instances will be spawned. If it is None all hosts in the virtual machine are candidates for the spawned instances of the function.
If sendBack is True the spawned tasks return the status and the return value of the function back to the spawner after the function exits. The return value must be pickleable.
Returns a list of TaskID objects representing the spawned tasks.
Works only if called by a spawner task.
In some systems tasks in virtual machine are spawned by an external utility before Python starts to run scripts (e.g. MPI v1). After a call to this function one of the pre-spawned tasks becomes the spawner (rank 0) while all others start executing a loop in which they handle requests for spawning a Python functions.
timeout is used where applicable. Negative values stand for infinite timeout.
Translates a path on the local machine to a tuple (index, suffix) where index denotes the index of the path entry in the PARALLEL_MIRRORED_STORAGE environmental variable and suffix is the path relative to that entry.
Note that suffix is a relative path even if it starts with /.
Translates a index and relPath to an actual path on the local machine.
This is the inverse of the translateToAbstractPath() method.
Updates the internal configuration information used by a spawner task.
Uses timeout where applicable. Negative values stand for infinite timeout.
Waits for the virtual machine to come up (become alive) for timeout seconds. Negative values stand for infinite timeout. If polling is used for checking the status of the virtual machine beatsPerTimeout specifies how many times in timeout seconds the status of the virtual machine is checked. If timeout is negative, beatsPerTimeout is the number of times polling is performed in 60 seconds.
Returns True if the virtual machine is alive after the function finishes.