Task([ws, parfile]) | Base class from which all tasks should be derived. |
TaskInit(name, bases, dct) | Metaclass for tasks. |
WorkSpace(**args) | This class holds all parameters, scratch arrays and vectors used by the various tasks. |
WorkSpaceType | |
printindent_string(txt, indentlen[, width, ...]) | Usage: |
set_globals(var, val) |
Task are python classes that perform computations of a higher level of complexity. They are intended as major building blocks of an automatic pipeline or as complex tasks to be called interactively by a user.
The main parts of a pipeline are a ‘run’ function that performs the main computations, and optional initialization part that is called before the first time execution of the task, and a workspace which holds all the input and outputsc.output parameters of the task. The workspace can also hold large scratch arrays and derived parameters, which are only calculated when needed.
If Task is an instance of a task and par is a parameter of that task, then the parameter can be accessed through Task.par and set through Task.par=value. Alternatively one can use Task["par"] and Task["par"]=value. (These are actually getter and setter functions. The actual value is stored in Task._par).
For interactive use there are a number of easy to remember command line (i.e., Python prompt) functions that allow convenient loading of tasks (tload), listing of available parameters (tlist) inspecting of parameters (tpars), modifying them (tpar par=value), storing and retrieving of parameters (tput/tget), and execution of tasks (go, trun). The task instance itself is retrieved with the function task() or simply with the variable Task.
The tasks are autmatically load when importing the pycrtools, e.g:
import pycrtools as cr
The list of available tasks can be viewed with ‘tlist’ and a specific task is loaded via tload tasknumber or tload taskname. The task stored last will be reloaded when the interactive session is restarted the next time.
In general tasks can then be called in the conventional way simply by calling:
Task(par1=value1,par2=value2,....).
This actually executes a wrapper function which then puts those parameters into the workspace and executes the Task.run() function.
You can also exectue them with trun"taskname", par1,par2=xyz,par3=.... This will run the task with name “taskname” and assign it to the global variable Task
A task can return a return value (if the run function does return a value), e.g.:
value=Task()
The run function of the task can be called repeatedly without re-running the initialization: Task.run() executes the run function and Task.init() executes the init function.
To run the task one can also simply type go (also with extra function parameters, i.e. go positional_parameter1,par1=value1,par2=value2,....). This runs the task one has loaded with tload and will also call tput to store the current input parameters to a system database on disk (in ~/.pycrtools/task).
Here is an overview of interactive functions one can use at the command prompt in ipython:
Task | the currently loaded task instance |
Task.ws | print the workspace and its parameters (like tpars) |
Task.parname (=value) | the access or set a parameter (without updating) - parname is the name of the parameter |
Task(par1,par2,...) | run the task with parameters par1,par2,... |
tlist | to view the available tasks |
tlog | list the log of recently run tasks, including execution times. |
tload 2 | to load the task #2 (can also provide a name) |
tload "averagespectrum" | i.e., this is safer in code since the task number can change with time |
tpars | to list all parameters |
tpar nchunks=2 | to set a parameter |
go | to run the loaded task |
trun "taskname", pars... | load and run the task with name “taskname” -> Task |
tpar parfile="avgspec_2011-02-15_23:52:15.par" | to read back a parameter file |
treset | to reset parameters to default values |
tget (name) | to read back the parameters from the latest run - will also be done at tload - or get the one stored under ‘name’ |
tput (name) | store input parameters in database (under ‘name’) |
tinit | run the initialization routine again (without resetting the parameters to default values) |
thelp | print documentation of task module |
Using go, the input parameters in the workspace will be stored whenever a task is run and hence tasks can be continued across sessions without having to retype the parameters. The workspace can be accessed via myWorkSpace=Task.ws and a workspace can be provided as input to a task, e.g. f(ws=myWorkSpace).
The task will also write all input and output parameters to a parameter file (taskname-TIME.par) at execution time (and at the end). This file is in python style and easy to read and to edit with a normal text editor. Tasks can be run with this file as input to set the parameters accordingly, using:
Task(parfile=filename,par1=value1,...)
The location where the .par files are stored can be changed by setting the global variable tasks.task_outputdir='dir...'``in the task module. To inhibit writing .par files one can set ``tasks.task_write_parfiles = False. This will affect all tasks. A list of all task files created within the session, will be stored in tasks.task_parfiles.
The task workspace is a relatively powerful construct. In the most simplest case it just holds the input parameters. Of course, default values can be provided so that not all parameters have to be specified. The parameters can also have various properties determining for what they are used (e.g. input/output/workarrays) and a documentation string which is printed at output and included in the __doc__ string of the task.
The default value, however, can also be a function which is executed the first time the variable is accessed and the result is then stored as the parameter value. This is one example of an ‘output’ parameter, that can, however, be turned into an input parameter if the user assigns it a new value.
Note, that since derived parameters are only calculated when they are used, some parameters may never get set if they are not needed. So, do not panic if the workspace shows a number of ‘undefined’ parameters. They simply might no have been called yet and will be filled during execution of the task.
This ‘on-first-call evaluation’ can save memory, since not all arrays need always be created. Also, the order in which derived parameters are defined does not matter as long as they only depend on other defined parameters of the workspace. This functionality actually makes the init function relatively superfluous.
All derived parameters know on which other parameters they depend. If a parameter in the workspace is modified this information is preserved. When executing ‘Task.update()’, then all the derived parameters which depend directly or indirectly on a modified parameter will be recalculated. Update will be called automatically if one uses tpar par=value (or tpar(par=value) if one does not use ipython).
The value will also be updated if the parameter is deleted tdel par (Note: when using simply del Task.par the recalculation happens upon next calling of this parameter. If ws.update was not called explicitly after using del (instead of tdel) then parameters which par depends on might still have their old value!)
ATTENTION: If you use a dict, array or vector as input parameter (or anything non-atomic) and you modify the elements in the array, then the task has no chance of knowing that and the next time you start the task, the input has changed. So, always COPY from an input array to a work vector if you need to make such modifications, don’t just assign them the input array (you will just create a reference in this case)!
Logging:
Some basic logging and performance evaluation is built in. The variable tasks.task_logger will contain a list of dicts with recently run tasks, their names, start and execution times.
To reset the logger simply set tasks.task_logger=[].
The modules to import in a task module are:
from pycrtools import *
import pycrtools.tasks as tasks
If one wants to add a new task then it should either be defined in a separate new file in the directory modules/tasks or it should be added to one of the files in modules/tasks.
The four ingredients of a task are the parameters definition (a dict stored in Task.parameters), an init function, a call function, and a run function. Of these only the run function is really required, but either the parameters dict or the call function should be there to define the input parameters.
A simple example is given below:
class test(tasks.Task):
"""
Documentation of task - parameters will be added automatically
"""
parameters = dict(
x=dict(default=None, doc='x-value - a positional parameter', positional=1),
y=dict(default=2, doc='y-value - a normal keyword parameter'),
xy=dict(default=lambda ws: ws.y * ws.x, doc='Example of a derived parameter.'))
def init(self):
print "Calling optional initialization routine - Nothing to do here."
def run(self):
print "Calling Run Function."
print "self.x=",self.x,"self.y=",self.y,"self.xz=",self.xy
First the parameters are defined as a dict. The parameter dict consists of key value pairs, where the key is the variable name and the value is again a dict with the various properties of that particular variable. The following properties are defined
Yet another, and perhaps more recognizable form of defining input parameters is to provide a dummy call function will all input parameter in the definition. E.g.:
class test(tasks.Task):
"""
Documentation of task - parameters will be added automatically
"""
def call(self,x,y=2,xy=lambda ws:ws.x*ws.y):
pass
def init(self):
print "Calling optional initialization routine - Nothing to do here."
def run(self):
print "Calling Run Function."
print "self.x=",self.x,"self.y=",self.y,"self.xz=",self.xy
This has the same effect, but has the disadvantage of not providing documentation strings or other options. Both methods, however, can be combined, where a parameters dict contains the missing properties or additional parameters.
Note, that the call function is actually never called. You may, however, provide it with any code and use it for testing purposes.
The run function does the actual calculations. It will have no parameters (other than self, of course). When it is called, run can assume that all the parameters are available in the form self.par. Filling those values with the input parameters and calculating the derived parameters is done ‘behind the scenes’.
Once the task is imported, e.g. here with:
import tasks.averagespectrum
then an instance can be created with:
t=tasks.averagespectrum.test1()
which can then be called, e.g.:
>>> t(5)
Calling Run Function.
self.x= 5 self.y= 2 self.xz= 10
The parameters are accessed through t.par, i.e. here:
>>> t.x
5
Parameters can already be set at instantiation and provided as keyword arguments:
>>> t=tasks.averagespectrum.test1(y=3)
>>> t(5)
Calling optional initialization routine - Nothing to do here.
Calling Run Function.
self.x= 5 self.y= 3 self.xz= 15
If one types ls -rtl *.par one will find the latest parameter files generated during execution time, e.g:
-rw-r--r-- 1 falcke staff 696 Feb 17 00:22 test1_2011-02-17_00:22:13.par
We can inspect this with cat test1_2011-02-17_00:22:13.par or edit it.:
# Task: averagespectrum saved on 2011-02-17 00:22:13
# File: test1_2011-02-17_00:22:13.par
#-----------------------------------------------------------------------
# WorkSpace of test1
#-----------------------------------------------------------------------
x = 5 # x-value - a positional parameter
y = 3 # y-value - a normal keyword parameter
#------------------------Output Parameters------------------------------
# xy = 15 - Example of a derived parameter.
#-----------------------------------------------------------------------
The task can be executed with the parameter file as input:
t(parfile='test1_2011-02-17_00:22:13.par')
The parameters in the file can, however, be explicitly overwritten using keyword arguments, i.e:
>>> t(5,parfile='test1_2011-02-17_00:22:13.par',y=2)
Calling Run Function.
self.x= 5 self.y= 2 self.xz= 10
To simplify running a task one can use the ‘t’-shortcuts.
Here is an example of using it:
>>> taskload
>>> tlist
Available Tasks: [(0, 'test1'), (1, 'Imager'), (2, 'test2'), (3, 'averagespectrum')]
>>> tload 0
------> tload(0)
Parameters of task test1
#-----------------------------------------------------------------------
# WorkSpace of test1
#-----------------------------------------------------------------------
x = None # x-value - a positional parameter
y = 2 # y-value a normal keyword parameter
#-----------------------------------------------------------------------
>>> Task(5) # call task directly
Calling Run Function.
self.x= 5 self.y= 2 self.xz= 10
>>> Task(5,y=10) # call it with keyword arguments
Calling Run Function.
self.x= 5 self.y= 10 self.xz= 50
>>> go # start task with go, which can't handle positional parameters well yet
Starting task test1
Number of positional arguments provided ( 0 ) is less than required number ( 1 ). Keeping previous values.
Calling Run Function.
self.x= 5 self.y= 10 self.xz= 50
Task test1 run.
Base class from which all tasks should be derived.
ws - provide a workspace to initialize parameters parfile - provide a (python) parameter file defining variables, to initialize parameters kwargs - any parameter=value pair to set the value of the respective parameter in the workspace
To create a task instance say:
>>> Task = tasks.tasknam.TaskName()
To run it call Task(), or directly instantiate and run it by:
>>> Task = tasks.tasknam.TaskName()(pars ....)
To (re)run the task with currently set parameters, simply use:
>>> Task()
If called for the first time the initialization routine will be called.
To rerun the task with the positional parameters xN and the parameters parN set to the values provided and keeping the previous parameters the same, use:
>>> Task(x1,x2,...,par1=val1,par2=val2)
Task.ws returns the workspace (and print all the pararameters)
Parameters for calling the Task:
init = False - force the initalisation to run again
parfile = filename - read parameters from file
pardict - provide a dict with paramter value pairs or a taskname and a paramter dict. Parameters from the top level and in a dict with a taskname will be assigned. The dicts can be nested.
ws - replace workspace with a different Workspace and then update parameters therein as provided in the file and the keywords.
ws parameters will be overwritten by file parameter and they will be overwritten by keyword parameters (which thus have the highest priority).
If the run function returns a value, this value will be returned otherwise the task instance itself will be returned. Hence you can access all parameters through the returned task object.
Add a python “property” to the class which contains getter and setter functions for methods.
Example:
self.addProperty(name,lambda self:self[name],lambda self,x:self.__setitem__(name,x),lambda self:self.delx(name),”This is parameter “+name)
Calls the initialization routine if it wasn’t run yet (or force it to run nonetheless)
Gets the input parameters in the workspace from the parameter database (see also ‘tget’). This can be stored there with task.put() (or ‘tput’ from the command line).
task.put(name) will store the parameters under the keyword name and can be retrieved with put under this name.
If the name is not known a list of all known names is given.
Function to be called after a plotting command. If self.plot_pause = True it will pause and ask for user input whether and how to continue calculation and plotting. May modify self.plot_pause` and self.doplot.
Stores the input parameters in the workspace to the parameter database (see also ‘tput’). This can be restored with Task.get() (or ‘tget’ from the command line).
Task.get(name) will retrieve the parameters stored under ‘name’
delete = False - If True the database entry will be deleted.
Usage:
task.reset() -> Reset all parameter to the default state and rerun the initialization.
task.reset(par1=val1,par2=val2,...) -> reset all parameters to default and initialze with parameters provided.
task.reset(restorecallparameters=True,par1=val1,par2=val2,...) -> reset all parameters to their state at initialization, but keep the parameters provided during the last initialisation.
init = True - If False, don’t force a re-run of the init routine.
Save the parameters to a file that can be read back later with the option parfile=filename (e.g., tpar parfile=filename)
Recalculates all existing derived parameters and assigns them their default values if they depend on a value that was modified. Note that parameters which had a default function at initialization but were set explicitly will not be recalculated. Use ws.reset() or del ws.par first.
Usage: Task.updateHeader(harray,parameters=[‘parname1’,’parname2’,...],newparname1=oldparname1,newparname2=oldparname2,....)
Will set parameters in the header dict of the hArray ‘harray’.
First of all, there will be a new dict named according to the current task, containing all exportable parameters.
Secondly, one can set additional parameters of the Task as (top-level) header parameters, by providing their name in the list ‘parameters’ or as keyword arguments (the latter allows one to give them a different name in the header).
parameters = [] - a list of task parameter names to be saved in the header
Usage:
Task.writehtml(self,results=None,parfiles=None,plotfiles=None,text=[],logfile=None,output_dir=None,htmlfilename=”index.html”,filename=None, tduration=-1):
Description:
Will write an informative html page containing plots and output parameters from the task.
The following parameters are availaible. Typically none of them needs to be specified, they will be automatically deduced from the task parameters.
Parameters:
text - a list of strings that will be included in the html file as such
filename - data filename this html page belongs to
parfiles - a list of parfile names that were produced by the task
logfiles - a list of logfiles produced by the task
tduration - execution time of task
Metaclass for tasks.
Should never be used direcly. All tasks should derive from ..class::Task instead.
Adds a task to the library and adds its parameters to the documentation.
Return a pretty string describing a parameter based on its properties. Typcially added to the __doc__ string of a class.
This class holds all parameters, scratch arrays and vectors used by the various tasks. Hence this is the basic workspace in the memory.
If ‘ws’ is the workspace the you can access parameters ‘parname’ in the workspace as ws.parname and set them with ws.parname=value.
The ‘ws.parname’ se are actually getter and setter functions. The actual value is stored in ws._parnanme and should not be accessed.
Every workspace actually creates its own new class, which contains these getter and setter functions. The workspace is defined, e.g. by
pardict - provide a dict with paramter value pairs or a taskname and a paramter dict. Parameters from the top level and in a dict with a taskname will be assigned. The dicts can be nested.
parfile - provide a filename from which to read parameters in the form par1=val1, par2=val2,...
Add a new parameter to the workspace, providing additional information, such as documentation and default values. The named parameters describe properties of the parameters. If no named parameters are given default values are used and added.
Example: ws.add(par,default=0,doc=”Parameter documentation”,unit=”km/h”)
The default values can also be a function, which takes as argument the workspace itself,e.g.
ws.add(par,default=lambda ws:ws[“other_parameter”]+1,doc=”This parameter is the value of another parameter plus one”,unit=”km/h”)
If another parameter is referenced it will be retrieved automatically, and set to a default value if necessary. This way one can recursively go through multiple parameters upon retrieval.
Add the defintion dict of one parameter to the overall dict containing parameter definitions.
This provides an easy interface to add a number of parameters, either as a list or as a dict with properties.
>>> ws.addParameters(["par1","par2",...])
will simply add the parameters parN without documentation and default values
ws.addParameters([("par1",val1, doc1, unit1),(,"par2",...),...])
will add the parameters parN with the respective properties. The properties are assigned based on their position in the tuple:
pos 0 parmeter name pos 1 default value pos 2 doc string pos 3 unit of values
>>> ws.addParameters({"par1":{"default":val1,"doc":doc1,"unit":unit1},"par2":{...},...})
will add the parameters parN with the respective parameters.
Add a python “property” to the class which contains getter and setter functions for methods.
Example:
self.addProperty(name,lambda ws:ws[name],lambda ws,x:ws.__setitem__(name,x),lambda ws:ws.delx(name),”This is parameter “+name)
Remove parameter from modification list.
Set all parameters to be unmodified.
Delete a parameter from the workspace. If the parameter was hardcoded before initialization (i.e., provided through ws.parameters) then the value will be reset but the parameter remains and will be filled with its default value at the next retrieval. Otherwise the parameter is completely removed.
Evaluates all parameters and assigns them their default values if they are as yet undefined.
Evaluates all input parameters and assigns them their default values if they are as yet undefined.
Evaluates all input and output parameters and assigns them their default values if they are as yet undefined.
Return a python set which contains the parameters that are derived from input parameters through a default function at initialization. These are the parameters which were defined before initialization in ws.parameters and which do have a function as default value. Note, that the value is not recalculated again even if the input parameters changed! One can enforce recalculation by calling ws.recalc().
workarrays = True - Include workarrays in the list nonexport = True - Include parameters which were not meant for export in the list
Return a python set which contains the parameters that are considered input parameters. This are those parameters which were defined before initialization in ws.parameters and which do not have a function as default value.
Returns the input parameters as a dict that can be provided at startup to the function to restore the parameters.
Return a list that contains all method names that simply contain a value, but were not assigned through self.add(), i.e. which do not have a getter and setter function or any description. These are typically inetranl variables that are not well documented.
Return all parameters that are considered output parameters, i.e., those which are ‘derived’ parameters and those explicitly labelled as output.
If parameter was defined in parameter_properties return the “doc” keyword, otherwise a default string.
Returns a python list containing all the parameter names
internals = False - If True all stored parameters are returned, including those not added by ws.add and which are typically only used for internal purposes.
excludeworkarrays = True - whether or not to exclude the data arary
excludenonexports = True - whether or not to exclude parameters that are marked to not be printed
all = False - really return all parameters (internals, workarrays, excludes)
Returns a python dictionary containing all the parameters and their values as key/value pairs.
internals = False - If True all stored parameters are returned, including those not added by ws.add and which are typically only used for internal purposes.
excludeworkarrays = True - whether or not to exclude the data arary excludenonexports = True - whether or not to exclude parameters that are marked to not be printed all = False - really return all parameters (internals, workarrays, excludes)
Return all parameters that are used as postional parameters, i.e. those which don’t have a default value or a keyword.
Returns true or false whether a parameter was modified since the last update or recalc. The function will also add the parmameter to the modified_parameters list if it was modified.
Return a string that contains all methods that simply contain a value, but were not assigned through self.add(), i.e. which do not have a getter and setter function or any description. These are typically inetranl variables that are not well documented.
Converts a tuple of parameter description values into a properly formatted dict. If the tuple is shorter than default values are used.
Example: partuple_to_pardict(self,(value,”Parameter description”,”MHz”)) -> {“default”:value,”doc”:”Parameter description”,”unit”:”MHz”}
Print all parameters stored in the workspace including internal parameters.
ws.reset(restorecallparameters=True) -> reset all parameters to their state at initialization, but keep the parameters provided during initialisation.
Set parameters in the workspace from parameters in a dict. If a taskname is provided and a key in the dict matches the taskname and the value is a dict again, then also apply all values in the dict. If follow_tree=True, then go recursively through all dicts to see if there is one key with an associated dict value that matches taskname.
The function will not complain if a parameter in the dict is not known to the workspace. It will simply ignore it.
root = True - This is the toplevel dict of global parameters. Set all toplevel key,value pares as parameters in the workspace. If False, only do so for a dict that is a value of a key matching taskname.
taskname - Name of the task to search for in the recursive search. Set parameters in workspace if a key has that name and has a dict as value.
follow_tree = False - If True, recursively go through all dicts on toplevel (and lower) which are not a parameter in the workspace.
Recalculates all existing derived parameters and assigns them their default values if they depend on a value that was modified. Note that parameters which had a default function at initialization but were set explicitly will not be recalculated. Use ws.reset() or del ws.par first.
irrespective of whether they depend on modified parameters or not.
Recalculates the named parameter and assign it the default values if it depends on a value that was modified. Note that parameters which had a default function at initialization but were set explicitly will not be recalculated. Use ws.reset() or del ws.par first.
Usage:
printindent_string(txt,indentlen,width=80,prefix=’# ‘)
Description:
Return a string to print a text as a block of with indentation and maximum width.
Example:
tasks.printindent_string('Hallo lieber Leser, dies ist ein Text, der umgebrochen werden soll!',10,width=20)