FORCE Tutorial Documentation

Build status Documentation Status

This repository contains documentation and tutorials for the FORCE project. It is implemented under the Formulations and Computational Engineering (FORCE) project within Horizon 2020 (NMBP-23-2016/721027).

The contents can be viewed at https://force-tutorial.readthedocs.io/en/latest/

Topics

Introduction

The business decision support system (BDSS) is a software for optimization of industrial processes in terms of economic and material constraints. It has been developed under the aegis of the Formulations and Computational Engineering (FORCE) project of the European Materials Modelling Council (EMMC). The EMMC is a consortium of academic and industrial partners spread across Europe, with the mission of “the integration of materials modelling and digitalization critical for more agile and sustainable product development.”

More specifically the BDSS is a software for single- and multiple-criterion optimization of a directed graph of computational operations (functions). The functions might include molecular-dynamics or computational fluid dynamics simulations, interpolation of experimental data and economic cost-price calculations. Nevertheless the BDSS is a general optimization software: the functions can be anything.

The BDSS can be run either as a stand-alone command-line program or within a graphic user interface (GUI) software called the Workflow Manager. The BDSS is also extensible: “plugins” can be developed by the user that add functionality to the software, typically in terms of functions and optimization algorithms.

Useful links:

European Materials Modelling Council

FORCE EU

FORCE Cordis page

Topics

Optimization

Optimization is the task of finding the set of parameters of a process that are “best” (optimal) in terms of a set of criteria, objectives or key performance indicators. For instance a soap factory might be looking for the optimal ratio of different fats and hydrolysis temperature (the inputs) that produce a soap that has the best smell, the most suds and is cheap (the criteria/objectives)/KPIs.

Key Performance Indicator (KPI)

A performance indicator that has a target value and is declared to be of importance to the company. A KPI can be a combination of one or more performance indicators, e.g. in objective functions. A performance indicator can be declared to be a key performance indicator if there is a target value related to it.

Traditionally manufacturers have optimized their process at the R&D lab bench, before scaling-up to the plant or factory. Optimization is guided, like any good experiment, by a mixture of theoretical knowledge, expert intuition and brute-force: just trying as many parameter combinations as possible in the time allowed. This is of course all very costly in terms of time and money, so for a long time manufacturers have employed computational models of their process.

Computational models are quick and cheap (at least relative to the bench), can themselves use past experimental data and can simulate the at-scale process rather then a miniature version. They will never supplant the bench, but they can reduce the leg-work. In industries such as aerospace, computer models have been a vital element of R&D for decades. In the materials and chemical industry, modelling is being driven by the increasing need for diverse, exotic and tailored materials. How can one optimize a computer model of such a material’s manufacture? Formally one can describe the process to be optimized as a mathematical function. The function has inputs (parameters) and outputs (objectives or criteria). The function itself is sometimes called the objective function. The value of each input is an axis in parameter space. Each point in parameter space is associated with a set of output/objective values. With two parameters (x and y, say), a two-dimensional parameter space, the output (z) can be visualised as a surface on the xy-plane with z as the height of the surface. For objective functions with more than two inputs, the output surface is a hypersurface in a multi-dimensional parameter space.

The definition of what is “optimal” depends on whether the objective function (process) has a single output or multiple outputs (criterion).

Single-criterion optimization

A function with a single output is a scalar function. The maxima and minima and are valley-bottoms and peaks of the output’s surface, respectively. The global minimum/maximum is the lowest/ highest valley-bottom/peak, respectively. Other valley-bottoms/peaks are local minima/maxima. The task of optimization is then to find at best the global minimum/maximum or at least a local minima/maxima.

Multi-criterion optimization

If the objective function has more than one output, there are multiple surfaces, each with its own minima and maxima. At any point in parameter space, either:

  • The slopes of the output surfaces all point in the same direction (all up or all down).
    One can always move away from that point and either increase all the outputs/criteria simultaneously or decrease all of them.
  • The slopes of the output surfaces point in opposite directions.
    Making a move always increases at least one output whilst decreasing at least one of the others. These conditions are known as Pareto dominated and Pareto efficient, respectively, after the Italian engineer and economist Vilfredo Pareto. He said that the optimal situation is when we cannot improve all outcomes simultaneously - that is a Pareto efficient point. The task of optimization is to find the set of Pareto efficient points. These points often form along a line through parameter space, the Pareto front.

Typically an industrial process will have multiple optimization criteria. Thus such algorithms are vitally important but have rarely been used through a lack of expertise and commonly available computational libraries or softwares that implement them.

Optimizing a Directed Graph

Many processes can be divided up into granular, near-self-contained sub-processes, with the output of one feeding into the input of another. For instance a soap production line might be divided up into the melting of the fat, the hydrolysis chamber, the mold cooling, milling and packing. It is often helpful to reproduce these divisions when simulating a process with a computer. Each sub-process is a function with inputs and outputs which feed one into another. The functions might include molecular-dynamics or computational fluid dynamics simulations, interpolation of experimental data or economic cost-price calculations. The inputs and outputs might include concentrations of particular chemicals, physical parameters such as temperature and pressure, price and quality of materials or manufacture time.

There are many names for a set of connected sub-processes. The pipeline is common in computer science, but has a clear industrial origin - reaction vessels (sub-processes) connected by pipes. Another name is a workflow, with the clear flavor of the office. A mathematician would define it as a directed graph, with the sub-processes as nodes and the connections as edges.

The entire graph/process of nodes/sub-processes/functions is itself a single “super” function. Its inputs are all those node/sub-process/ function inputs that are not fed by an output (i.e. do not form an edge). Its outputs are all the node/sub-process outputs (i.e. both those forming an edge and not). The graph super-function can be optimized like any other. Critically for this we must know how to execute the super-function.

The graph’s nodes must be executed in a strict order, such that if the output of node A forms the input of node B, then A must be executed before B. Nodes with no edge between them, may be able to be executed in parallel, and so form an execution layer. For graphs with a small number of nodes like that below it is easy to figure out the execution layers manually. There are topological sort algorithms which can calculate the execution order and execution layers of a directed graph: for instance Kahn’s algorithm and depth-first search. In the BDSS the execution layers are set manually by the user.

_images/graph.png

A graph with nodes A to F, inputs m to p and outputs y and z.

Many processes both man-made and in nature are cyclic. However a directed graph of functions must be acyclic if it is to be optimized: a cyclic graph will run forever.

Using the BDSS: The Workflow Manager

The Workflow Manager allows the user to easily construct a graph of functions, optimize this graph and view the results graphically. A number of terms are specific to the BDSS and the Workflow Manager or otherwise have a specific meaning within them.

Workflow
The directed graph of functions that models a process.
Data source
A node (function) in the graph (workflow).
Data value
An input or output of a node (data source).
Parameter
An input of the graph (workflow) and how that input should be treated (as a numerical or categorical variable, etc: see below).
Key performance indicator (KPI)
An output (criterion or objective) of the graph (workflow).
MCO.
Multi-criterion optimizer.

Topics

Installation

The BDSS, the Workflow Manager and all plugins can be cloned from the Force 2020 github respositories. For the BDSS and Workflow Manager,

git clone https://github.com/force-h2020/force-bdss
git clone https://github.com/force-h2020/force-wfmanager

This tutorial uses the Enthought-Example and Nevergrad plugins as examples,

git clone https://github.com/force-h2020/force-bdss-plugin-enthought-example
git clone https://github.com/force-h2020/force-bdss-plugin-nevergrad

Enthought Deployment Manager

The BDSS, the Workflow Manager and plugins must be installed through the Enthought Deployment Manager (EDM), a python virtual environment and package manager. For new users it is worth examining EDM’s documentation.

To install EDM, follow the instructions specific to your operating system ,here.

The Bootstrap Environment

Once EDM is installed create a ‘bootstrap’ environment from which you can install the BDSS, Workflow Manager and plugins,

edm install -e bootstrap -y click setuptools

Note that ‘bootstrap’ can be replaced by any name to the same effect. Now you can enter bootstrap with,

edm shell -e bootstrap

and your shell prompt is prefixed with (bootstrap).

The BDSS Runtime Environment

Although repositories (BDSS, etc) are installed from the bootstrap environment, they are installed into a separate environment, within which the BDSS and the Workflow Manager will actually run. Thus this environment has also to be created before installation. To do this first cd into the cloned force-bdss respository,

~/Force-Project (bootstrap)$ cd force-bdss

and then,

~/Force-Project/force-bdss (bootstrap)$ python -m ci build-env

This creates a environment called force-pyXX, where XX refers to the python version that the environment runs (e.g. force-py36 for python 3.6) . You will now see it in the list of EDM environments,

(bootstrap)$ edm environments list

>> * bootstrap     cpython  3.6.9+2  win_x86_64  msvc2015  ~\.edm\envs\bootstrap
>>   force-py36    cpython  3.6.9+2  win_x86_64  msvc2015  ~.edm\envs\force-pyXX

To run BDSS from the command line see Using the Command Line.

Repository Installation

From the bootstrap environment (not force-pyXX!), for each respository in turn, cd into its directory and then install it with python -m ci install. i.e.,

~/Force-Project/force-bdss (bootstrap)$ python -m ci install

~/Force-Project/force-bdss (bootstrap)$ cd ../force-wfmanager
~/Force-Project/force-wfmanager (bootstrap)$ python -m ci install

~/Force-Project/force-wfmanager (bootstrap)$ cd ../force-bdss-plugin-enthought-example
~/Force-Project/force-bdss-plugin-enthought-example (edm)$ python -m ci install

~/Force-Project/force-wfmanager (bootstrap)$ cd ../force-bdss-plugin-nevergrad
~/Force-Project/force-bdss-plugin-nevergrad (bootstrap)$ python -m ci install

...etc

Starting the Workflow Manager

The Workflow Manager can be started from within the bootstrap environment with,

(bootstrap)$ edm run -e force-pyXX -- force_wfmanager

where force-pyXX is the BDSS runtime environment. Alternatively one can enter the runtime environment, e.g. force-py36,

(bootstrap)$ edm shell -e force-py36

and then,

(force-py36)(bootstrap)$ force_wfmanager

Views

The Workflow Manager has two major UI components or “views”:

Setup Workflow
For constructing the workflow, selecting parameters and KPIs and selecting an optimizer.
_images/setup_view.png
View Results
For viewing the results of an optimization.
_images/results_view.png

You can switch between the views with the top-left button in the tool-bar: the label of this button will change accordingly. We will consider the two views, in turn, over the next two topics.

Setup the Workflow

The left panel contains a tree view that displays the workflow, parameters and KPIs and optimizer. Clicking on any item in the tree brings up fields and buttons in the right panel for setting the selected attribute.

_images/execution_layer.png

Below we will create one execution layer with two Gaussian data sources. Each data source describes a two-dimensional Gaussian on the x-y plane.

\[a = a_{peak} \exp{\left[- \frac{(x - c_{x})^{2}}{2 \sigma_{x}^2} - \frac{(y - c_{y})^{2}}{2 \sigma_{y}^2}\right]}\]

With a negative peak amplitude (a peak) the Gaussian forms a minima in xy-parameter space. With the amplitudes of two such Gaussians as the criteria/KPIs, the Pareto front stretches between their two minima. We will find this front using an MCO built on top of the Nevergrad gradient-free optimization library. This MCO is provided by a dedicated plugin.

Create an Execution Layer

Select the Execution Layers tree-item and press the Add New Execution Layer button. A tree-item, Layer 0, appears under Execution Layers.

Add a Data Source

Selecting the Layer 0 tree-item brings up two panels at the right:

Available Data Source Factories
A tree-list of all the available data sources, arranged by the plugin that has contributed them.
Configuration Options/Description
A description of the data source.

Select one the Gaussian data sources contributed by the Troughs and Waves plugin and press the Add New Data Source button to add it to the execution layer.

_images/new_source.png

The data source is added as a tree-item under Layer 0. Selecting this item brings up four panels at the right:

  • Input variables
    The list of inputs.
  • Output variables
    The list of outputs.
  • Selected parameter description
    The description of the selected input/output.
  • A list of constants that will not be optimized.
_images/input_variables.png

The Variable Name fields of the Input variables and output variables are used to connect data sources in different execution layers. Any output-input pair that you want to connect as an edge, should be given the same Variable Name. Otherwise you can enter anything you like: it is easiest to use the name that appears in Selected parameter description. This is what we will do for the Gaussian data source, as we are not connecting it to another data source (both Gaussian data sources will be in the same execution layer).

Add a second Gaussian data source to the same execution layer. The list of constants for the Gaussian data source are:

  • the peak amplitude
  • position of the peak (Center x and y coordinates)
  • width of the peak (standard deviation or Sigma along the x and y axis)

Center the Gaussians at (-1, 1) and (1, 1) with amplitudes of -2 and -1, respectively. The first Gaussian is then global minimum whereas the second is a local minimum.

_images/input_variables_g2.png

Their Input variable names should be the same (e.g. x and y), so that they refer to the same x and y parameters.

Their Output variable names (their amplitudes) should be different (e.g. a1 and a2), so that they are recognised as separate KPIs.

Select an Optimizer

Selecting the MCO tree-item brings up two panels at the right:

Available MCO Factories
A tree-list of all the available optimizers, arranged by the plugin that has contributed them. Note that not all of these will be multi-criterion optimizers.
Configuration Options/Description
A description of the selected optimizer.
_images/optimizer_select.png

Select an optimizer and press the Add New MCO button. The optimizer is added as a tree-item under MCO. Selecting this item brings up a single panel to the right:

Item Details
Certain parameters that control how the optimizer works.

Select CMA for the algorithm and set 1000 for Allowed number of objective calls.

_images/mco_algo.png

Select the Parameters

Under the optimizer are two further tree-items for setting the parameters and KPIs.

Selecting the Parameters tree-item brings up two panels at the right:

Available MCO Parameter Factories
A tree-list of all the available parameters for the optimizer.
Description
The description of the selected parameter.
_images/param_select.png

When we specify a “parameter”, as well as selecting a data source input we must also tell the optimizer how to treat that input: its parameterization. Is the parameter:

  • fixed (i.e. a constant)?
  • continuous, with a lower and upper bound?
  • categorical, a member of an ordered or unordered set?

Certain optimizers can only handle certain parameterizations. For instance, gradient-based optimizers can only handle continuous parameters, not categorical (which don’t have a gradient). The Nevergrad optimizer can handle all types, but for now we will only use continuous (‘Ranged’).

_images/ranged_parameter.png

Select the Ranged item and press the New Parameter button. A new panel appears at the top-right. This will contain a tab for each parameter added. A Ranged parameter tab has the following fields:

Name
A drop-down list of data source inputs. Select the input “x”, the x coordinate.
Lower bound
Set the lower bound to -5.
Upper bound
Set the lower bound to 5.
Initial value
Slide this to anything (it doesn’t matter to the Nevergrad optimizer).
N samples
This has no meaning and can be ignored.

Add another Ranged parameter for the y coordinate and set the same bounds and initial value.

Select the KPIs

Selecting the KPIs tree-item brings up a New KPI button. Pressing this button brings up a tabbed pane, one tab for each KPI added with the following fields:

Name
A drop-down list of data-source outputs. Select the output “a1”, the amplitude of the first Gaussian data source.
Objective
Choose whether to minimize or maximize the KPI. With maximize chosen, the KPIs are simply negated during optimization. In our case choose minimize as the Gaussians have negative peak amplitude. If you make the Gaussian peaks positive and then choose maximize: this will give you the same results.
Auto scale
This is used by some of the optimizers to scale the KPIs so that they have comparable amplitudes. The Nevergrad optimizer does not scale, so ignore this.
_images/kpi_minimize.png

Add a KPI for the second Gaussian (“a2”) in the same manner.

Run the Workflow

You may have noticed that some of the tree-items had a warning-sign icon next to them. These are to warn the user that something has not been set correctly, such that the optimization will not run. A Workflow Errors field at the bottom-left of the window shows a message indicating the error(s). Hopefully by now, after creating the workflow, selecting the optimizer and setting the parameters, all the tree-items should be blue squares, indicating that there are no errors. Now all that is left is to run the optimization.

Below Workflow Errors is a Run button, which starts the optimization. Alternatively you can press the Run button in the top tool-bar.

_images/run_bar.png

After pressing Run, a log window (command prompt) will appears on Windows, that displays certain outputs of the optimization process as they occur. This closes when the optimization has finished. Nothing appears on the Mac.

View the Results

Press the View Results button in the tool-bar. The View Results view, contains two panels:

Results Table
The values of the parameters and KPIs, in our case for each point in the Pareto-efficient set.
Plot
A scatter plot of the points listed in the table. You can change the axis’ of the plot from the drop-down lists and you can color code the points (according to KPI value, say) by pressing the Color button, which brings up a self-explanatory menu.
_images/results_gauss.png

The Pareto front for the two Gaussian data sources, calculated with the Nevergrad CMA algorithm. The front stretches between the peaks centered at (-1, -1) and (1, 1). The Pareto efficient points are color coded by the amplitude of the Gaussian centred at (-1, -1): both Gaussian’s have negative amplitude (are minima) and cooler colors indicate lower values.

Saving the Workflow as a JSON file

Once a workflow has been created and optimizer, parameters and KPIs selected, you can save this as a json file that can be loaded in future sessions. In the file menu select File > Save Workflow as. This brings up a file save dialog from which you can name and save the json. When you wish to load the json, go to the Setup Workflow view, press Open in the tool-bar and select the json file. The entire workflow, optimizer, parameters and KPIs will be loaded.

Using the Command Line

Both the BDSS and the Workflow Manager can be invoked from the command line whilst in the BDSS runtime environment. For example, if the runtime environment is force-py36,

# enter the environment
$ edm shell -e force-py36

# execute the workflow
(force-py36)$ force_bdss workflow.json

# open the Workflow Manager with the workflow loaded
(force-py36)$ force_wfmanager workflow.json

# open the Workflow Manager
(force-py36)$ force_wfmanager

The force_bdss command initiates the BDSS MCO runner, and therefore must be passeda workflow JSON that contains optimization instructions. The force_wfmanager command initiates the Workflow Manager GUI, and therefore can start up with a default empty workflow, since it provides additional UI features to create, modify and export workflows.

The force_bdss can also be invoked using the --evaluate flag, which switches the application from ‘optimize’ to ‘evaluate’ mode and performs a single point evaluation of the workflow only. This functionality was designed to allow an external process (or program) to control the optimization procedure, whilst the system itself continues to be represented as a FORCE BDSS workflow. This is considered an ‘advanced’ feature of the BDSS framework, and so will be explored in a later extension to the main tutorial.

EDM also supports running commands from outside an environment, using the edm run command.

$ edm run -e force-py36 -- force_wfmanager

For further assistance on EDM, use the edm --help tool or visit the latest documentation.

Extending the BDSS: Plugin Development

Force BDSS is extensible through the Envisage plugin framework. A plugin can be (and generally is) provided as a separate python package that provides some new classes. Force BDSS will find these classes from the plugin at startup.

A single plugin can provide one or more of the following: MCO, DataSources or NotificationListeners. It can optionally provide DataView and ContributedUI objects to be used by the force_wfmanager GUI: these features will be dealt with in an extension tutorial.

An example plugin implementation is available at:

https://github.com/force-h2020/force-bdss-plugin-enthought-example

To implement a new plugin, you must define at least four classes:

  • The Plugin class itself.
  • One of the entities you want to implement: a DataSource, NotificationListener or MCO.
  • A Factory class for the entity above: it is responsible for creating the specific entity, for example, a DataSource
  • A Model class which contains configuration options for the entity. For example, it can contain login and password information so that its data source knows how to connect to a service. The Model is also shown visually in the force_wfmanager UI, so some visual aspects need to be configured as well.

Topics

Traits and TraitsUI

Traits is a python package, developed by Enthought, for creating and interacting with statically-typed variables: ‘traits’. To efficiently develop UIs for traits Enthough developed a sister package, TraitsUI.

A class that has traits variables as attributes, inherits from HasTraits, so that those variables can be initialized and handled appropriately. Most, if not all, of the classes in the BDSS (and in the Workflow Manager) inherit from HasTraits, therefore before extending BDSS it is useful to have some basic knowledge of Traits and TraitsUI.

Full documentation can be found here:

Traits

TraitsUI

These provide brief introductions to the packages. Here we provide an even more minimal introduction that should make the code examples in following topics clearer.

Traits

Traits are class objects (like every variable in python). The more common classes just wrap around a built-in python type, with a class name that is the camel case version of the built-in type. For instance,

# initialization of a string trait, x.
x = Str('hello world')

# initialization of a dictionary trait, y.
y = Dict({'English':'hello world', 'German':'hallo welt'})

# initialization of a float trait, z, to the default value of 0.0
z = Float()
print(z)
>> 0.0

Traits are typically initialized within a HasTraits class,

class HelloWorld(HasTraits):

    x = Str('bonjour le monde')

    .....

The HasTraits inheritence defines a constructor that takes the traits as arguments.

my_hello_world = HelloWorld(x='ciao mondo', .....)

print(my_hello_world.x)

>> ciao mondo

If no argument is given for a trait it is initialized to the value (default or otherwise) given within the class declaration,

my_hello_world = HelloWorld()

print(my_hello_world.x)

>> bonjour le monde

As with any python class member, traits variables are refered to by self in methods,

class HelloWorld(HasTraits):

     x = Str('bonjour le monde')

     def shout_the_greeting(self):
         return self.x.upper()

my_hello_world = HelloWorld()

print(my_hello_world.shout_the_greeting())

>> BONJOUR LE MONDE

Almost all classes in the BDSS and the Workflow Manager (including all those in the code examples in the following topics) inherit from HasTraits, usually indirectly through a base class (you won’t see HasTraits in the class declaration).

Views

TraitsUI provides the UI to traits (as the name suggests!). It provides any HasTraits object with a default UI that exposes all the traits it contains. Each trait type is associated with a default UI element (text field for a Str, etc.) and TraitsUI lays out these elements automatically in a window or panel.

A custom layout, possibly including custom UI elements (‘editors’), can be provided by intializing a View object within the TraitsUI class,

class HelloWorld(HasTraits):

    x = Str('bonjour le monde')

    y = Int(5)

    view = View(
        Item(name='x', label='hello message', editor=HTMLEditor()),
        Item(name='y', label='number of people listening'),
        padding=10,
        resizable=True
    )

Each trait is associated with an Item object (itself a HasTraits class) by assigning the Item ‘s name attribute to the string of the trait variable name. In addition, the Item constructor has optional arguments that determine what non-default UI elements (‘editors’), if any, are used to expose the trait and how they are laid out.

The Item s are assigned to a View object as * vargs. In addition the View constructor has a number of optional keyword arguments that determine layout, etc.

For layout purposes Item s can be grouped by assigning them to Group objects that are then assigned to the View.

view = View(
    Group(
        Item(name='x', label='hello message'),
        Item(name='y', label='number of people arriving'),
        label='arriving'
    ),
    Group(
        Item(name='i', label='goodbye message'),
        Item(name='j', label='number of people departing'),
        label='departing'
    )
)

Like for the View, the Group constructor has a number of keyword arguments that effect layout, labelling, etc.

In the following topics, code examples with View initializations will show the resulting UI alongside.

Creating a Plugin

All plugin classes must

  • Inherit from force_bdss.api.BaseExtensionPlugin
from force_bdss.api import BaseExtensionPlugin, plugin_id

VERSION = 0

class ExamplePlugin(BaseExtensionPlugin):
"""This is an example of the plugin system for the BDSS."""
  • Implement a id class member, that must be set to the result of calling the function plugin_id()
  id = plugin_id("enthought", "example", VERSION)

The three arguments determine the unique name of the extension point
  • Implement the methods get_name(), get_version() and get_description() to return appropriate values. The get_version() method in particular should return the same value as in the id (in this case zero). It is advised to extract this value in a global, module level constant
def get_name(self):
    return "Enthought example"

def get_description(self):
    return "An example plugin from Enthought"

def get_version(self):
    return VERSION
  • Implement a method get_factory_classes() returning a list of all the classes (NOT the instances) of the entities you want to export.
def get_factory_classes(self):
    return [
        ExampleDataSourceFactory,
        ExampleMCOFactory,
        ExampleNotificationListenerFactory,
        ExampleUIHooksFactory,
    ]

Install the Plugin

In order for the BDSS to recognize the plugin, it must be installed as a package in the deployed environment (force-py36). This can be performed using pip and an appropriate setup.py file, that employs the setuptools package. Additional documentation describing package building using setuptools can be found here.

The plugin is declared as an extension to the force_bdss by having it defined as the setup command entry_points keyword argument, under the namespace force.bdss.extensions. You have to specify a path to the plugin class (in this case ExamplePlugin), as given below. The name (before the '=') of the plugin is irrelevant, but to avoid confusion, try to use the name of the module. For example

entry_points={
    "force.bdss.extensions": [
        "enthought_example = "
        "enthought_example.example_plugin:ExamplePlugin",
    ]
}

A basic example setup.py file is therefore shown below

from setuptools import setup, find_packages

VERSION = 0

setup(
    name="enthought_example",
    version=VERSION,
    entry_points={
        "force.bdss.extensions": [
            "enthought_example = "
            "enthought_example.example_plugin:ExamplePlugin",
        ]
    },
    # Automatically looks for file directories containing __init__.py files
    # to be included in package
    packages=find_packages(),
)

Running the following command line instruction from the same directory as setup.py will then install the package in the deployed environment

edm run -e force-py36 -- pip install -e .

Advanced Plugins

Additionally, a plugin can also define one or more custom visualization classes for the GUI application force-wfmanager, typically to either display data or provide a tailor-made UI for a specific user.

In which case, the plugin class must inherit from force_bdss.api.ServiceOfferExtensionPlugin , which is a child class of BaseExtensionPlugin. Any UI subclasses can then be made discoverable by force-wfmanager using the Envisage ServiceOffer protocol through the get_service_offer_factories method

def get_service_offer_factories(self):
    """A method returning a list user-made objects to be provided by this
    plugin as envisage ServiceOffer objects. Each item in the outer list is
    a tuple containing an Interface trait to be used as the ServiceOffer
    protocol and an inner list of subclass factories to be instantiated
    from said protocol.

    Returns
    -------
    service_offer_factories: list of tuples
        List of objects to load, where each tuple takes the form
        (Interface, [HasTraits1, HasTraits2..]), defining a Traits
        Interface subclass and a list of HasTraits subclasses to be
        instantiated as an envisage ServiceOffer.
    """

Make sure to import the module containing the UI classes from inside get_service_offer_factories: this ensures that running BDSS without a GUI application doesn’t import the graphical stack.

There are currently two types of custom UI object that may be contributed by a plugin: IBasePlot and IContributedUI. These interfaces represent requirements for any UI feature that can be used to display MCO data or a present a simplified workflow builder respectively.

Also, multiple types of plugin contributed UI objects can be imported in the same method. For instance

from force_bdss.api import ServiceOfferExtensionPlugin

class ExamplePlugin(ServiceOfferExtensionPlugin):
    """This is another example of the plugin system for the BDSS."""

    def get_service_offer_factories(self):
        from force_wfmanager.ui import IBasePlot, IContributedUI
        from .example_custom_uis import PlotUI, ExperimentUI, AnalysisUI

        return [
            (IBasePlot, [PlotUI]),
            (IContributedUI, [ExperimentUI, AnalysisUI])
        ]

These plugins are installed in the same way as described previously, but are only accessible when running the force_wfmanager GUI.

Factories and Classes

A factory object (BaseFactory) defines a set of other classes that are required to define one of the following:

  • a data source
  • an optimizer
  • a parameterization
  • a notifier

Each factory returns the classes that are required. Both these and the factory class must be written by the plugin writer. In turn, the former may depend on other classes that also need to be written.

Below we provide links to well documented code examples of factories and their associated classes.

Factories for different BDSS Categories
BDSS category factory classes defined by factory other classes example
data source BaseDataSourceFactory BaseDataSource, BaseDataSourceModel   Gaussian
parameterization BaseMCOParameterFactory BaseMCOParameter   Ranged
optimizer BaseMCOFactory BaseMCO, BaseMCOModel, BaseMCOCommunicator BaseOptimizerEngine, IOptimizer Nevergrad

In the following topics we go into these in more detail.

Data Source

A data source is a node in the workflow graph.

In the Workflow Manager, when you click on a data source, you will see its:

  1. function.
    The function that the node computes.
  2. inputs.
    The function’s parameters that can either be optimized (by selecting it as an MCO Parameter) or fed from the outputs of other nodes (by setting its variable name to that of an output of another node).
  3. outputs
    The function’s returns that can be optimization criteria (by selecting it as an MCO KPI) and/or be passed to the inputs of other nodes (by setting its variable name to that of an input of another node).
  4. internal parameters (model)
    The function’s parameters that are ‘internal’ to the node (are not node inputs). They cannot be optimized but can be set by the user in the Workflow Manager. Think of them as the function’s ‘constants’.
_images/data_source_wfmanager.png

These aspects of the node are represented by a set of class objects:

_images/data_source_schematic.png

We will illustrate how to design and use these objects, using the example of the Gaussian data source, a two-dimensional Gaussian on the xy-plane:

\[a = a_{peak} \exp{\left[- \frac{(x - c_{x})^{2}}{2 \sigma_{x}^2} - \frac{(y - c_{y})^{2}}{2 \sigma_{y}^2}\right]}\]

The source code for this data source can be examined here.

DataValue and Slot

Both these classes represent node (data source) inputs and outputs.

DataValue objects carry the actual value of an input/ouput and are passed between connected node’s during execution of the graph/workflow. Their attributes are:

value. The value of the input/output.

type. The type of the input/output: a string that is meant to be a CUBA key.

name. The name of the input/output.

Slot objects describe the input.output and are the UI (Workflow Manager) interface
to the DataValues. Their attributes are:

description. A description of the input/output.

type. The ‘CUBA’ key of the input/output.

These functions could be merged into a single class.

BaseDataSource

The node’s function.

class Gaussian(BaseDataSource):

    def run(self, model, parameters):
        x = parameters[0].value
        y = parameters[1].value

        a = ((x - model.cent_x) ** 2)/(2.0 * model.sigm_x ** 2)
        a += ((y - model.cent_y) ** 2) / (2.0 * model.sigm_y ** 2)
        a = model.peak * math.exp(-a)

        return [
            DataValue(value=a, type="AMPLITUDE"),
        ]

    def slots(self, model):
        return (
            (
                Slot(description="x", type="COORDINATE"),
                Slot(description="y", type="COORDINATE"),
            ),
            (
                Slot(description="a", type="AMPLITUDE"),
            )
        )

The run method is the function itself. Its arguments are:

model
The BaseDataSourceModel object that contains the function’s ‘internal’ parameters or ‘model’
parameters
The list of DataValue objects with the values of the node’s inputs, one element per input.

run() returns the list of DataValue objects that are the node’s outputs.

The slots method returns Slot objects corresponding to the node’s inputs and outputs, in the form of a tuple

((<tuple of input slots>), (<tuple of output slots>))

The elements of (<tuple of input slots>) correspond to the elements of the parameters argument of run. The elements of (<tuple of output slots>) correspond to the elements of run’s return.

BaseDataSourceModel

The node’s ‘internal’ parameters

class GaussianModel(BaseDataSourceModel):

    peak = Float(-2.0, label="Peak amplitude", desc="Amplitude of the peak.")
    cent_x = Float(-1.0, label="x", desc="x coordinate of the peak.")
    cent_y = Float(-1.0, label="y", desc="y coordinate of the peak.")
    sigm_x = Float(0.6, label="x", desc="Width (standard deviation) along the x-axis.")
    sigm_y = Float(0.6, label="y", desc="Width (standard deviation) along the y-axis.")

    traits_view = View(
        Item("peak"),
        Group(Item("cent_x"), Item("cent_y"), label="Center"),
        Group(Item("sigm_x"), Item("sigm_y"), label="Sigma")
    )

The label and desc attributes appear in the description of the data source when it is selected from a plugin.

_images/data_source_selection.png

The View object determines how they are presented for editing in the Workflow Manager (see above).

BaseDataSourceFactory

This is contributed to BDSS by the plugin and thus allows it to create instances of BaseDataSource and BaseDataSourceModel.

class GaussianFactory(BaseDataSourceFactory):
    def get_identifier(self):
        return "gaussian"

    def get_name(self):
        return "Gaussian"

    def get_description(self):
        return "This Data Source creates a two-dimensional " \
               "(xy-plane) Gaussian."

    def get_model_class(self):
        return GaussianModel

    def get_data_source_class(self):
        return Gaussian

The returns of the get_name and get_description methods appear in the description of the data source when it is selected from a plugin (see above).

Parameterization

The workflow’s parameters can be treated in different ways by the optimizer. A parameter might be treated as a continuous variable (i.e. a real number) or a dis-continuous variable (e.g. an integer) or a categorical variable (i.e. a member of a finite set) that may be ordered or unordered. These different possibilities are the parameterization of the parameter. They are important because certain optimizers can only deal with certain parameterizations. For example gradient-based optimizers can only optimize continuous variables.

A parameterization requires just two classes - a BaseMCOParameter and its factory. We will illustrate how to design and use these classes, using the example of the ‘ranged’ parameterization that comes built in with the BDSS. The source code for this parameterization can be examined here.

BaseMCOParameter

A parameterization.

class RangedMCOParameter(BaseMCOParameter):

    #: Lower bound for parameter values range
    lower_bound = Float(0.1, verify=True)

    #: Upper bound for parameter values range
    upper_bound = Float(100.0, verify=True)

    #: Initial value. Defines the parameter bias
    initial_value = Float(verify=True)

    def _initial_value_default(self):
        return 0.5 * (self.lower_bound + self.upper_bound)

    def default_traits_view(self):
        return View(
                ......
        )

    def verify(self):

        # .....

The class’s traits are the properties of the parameterization: in this case for a continuous variable (Float) with an initial value (initial_value) and a range (lower_bound and upper_bound).

The default_traits_view method provides a view to the Workflow Manager to control the parameterization attributes.

_images/parameter_set.png

The verify method is used to verify that a given parameter conforms with the parameterization: in this example that it is within the bounds. The details are not important and you might not want to even override the base method. Also any Trait that needs to be verified in the UI when it is changed should set the verify=True metadata.

BaseMCOParameterFactory

Each BaseMCOParameter must be associated with a BaseMCOParameterFactory that returns its class, description, etc.

class RangedMCOParameterFactory(BaseMCOParameterFactory):
    """ Ranged Parameter factory"""

    def get_identifier(self):
        return "ranged"

    #: A name that will appear in the UI to identify this parameter.
    def get_name(self):
        return "Ranged"

    #: Definition of the associated model class.
    def get_model_class(self):
        return RangedMCOParameter

    #: A long description of the parameter
    def get_description(self):
        return "A parameter with a ranged level in floating point values."

Each optimizer factory (see next topic) must return a list of such BaseMCOParameterFactory s that correspond to parameterizations that it can handle. These can then be selected from the list of Available MCO Parameter Factories in the Workflow Manager when creating the workflow’s parameters.

_images/parameter_factory.png

Optimizer

The optimizer classes carry out the optimization of the workflow, the graph function. These classes usually contain the acronym MCO, but there is no particular reason why your optimizer should be a multi-criterion optimizer. In fact you could create optimizer classes that don’t even optimize at all: the core requirement of these classes is that they return a point or points in parameter space, along with the associated criteria/KPIs/ojectives. These points could be minima or Pareto-effecient, but could just as well be any kind of sampling: e.g. a grid or random sample.

We will illustrate how to design and use these classes, using the example of the Nevergrad optimizer that comes with the Nevergrad plugin. The source code for this optimizer can be examined here.

BaseMCOModel

The TraitsUI to the optimizer.

class NevergradMCOModel(BaseMCOModel):

    #: Algorithms available to work with
    algorithms = Enum(
        *NevergradMultiOptimizer.class_traits()["algorithms"].handler.values
    )

    #: Defines the allowed number of objective calls
    budget = PositiveInt(100)

    #: Display the generated points at runtime
    verbose_run = Bool(True)

    def default_traits_view(self):
        return View(
            Item("algorithms"),
            Item("budget", label="Allowed number of objective calls"),
            Item("verbose_run"),
        )

It exposes a set of optimizer parameters with an associated View.

_images/optimizer_algo.png

BaseMCO

Creates a BaseOptimizerEngine object and runs that engine on the workflow.

class NevergradMCO(BaseMCO):

    def run(self, evaluator):
        model = evaluator.mco_model

        optimizer = NevergradMultiOptimizer(
            algorithms=model.algorithms,
            kpis=model.kpis,
            budget=model.budget)

        engine = AposterioriOptimizerEngine(
            kpis=model.kpis,
            parameters=model.parameters,
            single_point_evaluator=evaluator,
            verbose_run=model.verbose_run,
            optimizer=optimizer
        )

        for index, (optimal_point, optimal_kpis) \
                in enumerate(engine.optimize()):
            model.notify_progress_event(
                [DataValue(value=v) for v in optimal_point],
                [DataValue(value=v) for v in optimal_kpis],
            )

The run method takes a single argument - evaluator: this is a Workflow object (the name is evaluator, because Workflow implements the IEvaluator interface). The evaluator Workflow object has the attribute mco_model: this is the BaseMCOModel selected by the user (in our example NevergradMCOModel). Next run creates two objects:

NevergradMultiOptimizer
An optimizer satisfying the IOptimizer interface.
AposterioriOptimizerEngine
An optimizer engine, an implementation of BaseOptimizerEngine (see below)

By making a separate ‘optimizer’ and ‘optimizer engine’ we are making a subtle distinction between:

  • the optimizer itself: the core optimization algorithm, and
  • what we do with that optimizer: for instance find a Pareto efficient set by some particular method or track points on the way to a minima. This is the engine.

By separating these functions into different objects, we can mix-and-match optimizer and engine. For instance in this example we use a Nevergrad optimizer and an engine that directs the optimizer to find the Pareto efficient set by an *a posteriori* method. However we could use an engine that uses the Nevergrad optimizer to find the set by an *a priori* method.

It is not neccarsary to have a separate optimizer and engine: both functionalities can be bundled into a single BaseOptimizerEngine object. Once this object is created run() calls its optimize iterator, which yields the results of the optimization.

The results yielded by BaseOptimizerEngine’s optimize are wrapped into DataValue objects and then passed to the BaseMCOModel instance through its notify_progress_event method. This method has a concrete implementation in BaseMCOModel that takes the list of points and list of KPIs as arguments. However you can override this method if you want to pass additional/different values to the model.

BaseOptimizerEngine

Does the actual optimization.

class AposterioriOptimizerEngine(BaseOptimizerEngine):

    name = Str("APosteriori_Optimizer")

    optimizer = Instance(IOptimizer, transient=True)

    def optimize(self, *vargs):
        #: get pareto set
        for point in self.optimizer.optimize_function(
                self._score,
                self.parameters):
            kpis = self._score(point)
            yield point, kpis

As just mentioned the optimize iterator method of BaseOptimizerEngine, yields the optimization results. Each yield must consist of:

  • point
    A list of parameter (graph input) values. i.e. the point in parameter space.
  • kpis
    The criteria/objectives/KPI(s) at the point.

optimize may yield just a single point (e.g. a minimum) or mutiple points (e.g. a Pareto set, or grid sample).

In this example, optimize yields by calling another iterator: the optimize_function method of the IOptimizer instance. In our case this is the NevergradMultiOptimizer object we met earlier. However we won’t go any further into this: as explained, the separation of ‘optimizer’ from ‘engine’ is optional. All one has to know is that the engine must have a optimize iterator method which yields a point in parameter space and the KPI(s) at that point.

BaseMCOCommunicator

The MCO Communicator must reimplement BaseMCOCommunicator and two methods: receive_from_mco() and send_to_mco(). These two methods can use files, stdin/stdout or any other trick to send and receive data between the MCO and the BDSS running as a subprocess of the MCO to evaluate a single point.

BaseMCOFactory

This is contributed to BDSS by the plugin and thus allows it to create instances of BaseMCOModel, BaseMCO and BaseMCOCommunicator.

class NevergradMCOFactory(BaseMCOFactory):

    def get_identifier(self):
        return "nevergrad_mco"

    def get_name(self):
        return "Gradient Free Multi Criteria optimizer"

    def get_model_class(self):
        return NevergradMCOModel

    def get_optimizer_class(self):
        return NevergradMCO

    def get_communicator_class(self):
        return BaseMCOCommunicator

    def get_parameter_factory_classes(self):
        return [
            FixedMCOParameterFactory,
            ListedMCOParameterFactory,
            RangedMCOParameterFactory,
            CategoricalMCOParameterFactory,
            RangedVectorMCOParameterFactory
        ]

Note that we do not use a BaseMCOCommunicator in this example, so just return the base class.

Also note the get_parameter_factory_classes method. This returns a list of parameterization factories that suitable for the optimizer (see the last topic). These then appear in the Workflow Manager, when selecting parameters.

_images/parameter_factory.png

Notification

Notification listeners are used to notify the state of the MCO to external listeners, including the data that is obtained by the MCO as it performs the evaluation. Communication to databases (for writing) and CSV/HDF5 writers are notification listeners.

Each notification listener is defined by an implementation of BaseNotificationListenerFactory, which contributes both BaseNotificationListenerModel and BaseNotificationListener subclasses. It therefore requires implementation of the following additional abstract methods alongside the standard get_identifier, get_name, and get_description methods

def get_model_class(self):
    Returns a BaseNotificationListenerModel subclass

def get_listener_class(self):
    Returns a BaseNotificationListener subclass

The BaseNotificationListener class must reimplement the following methods, that are invoked in specific lifetime events of the BDSS

def initialize(self):
    Called once, when the BDSS is initialized. For example, to setup the
    connection to a database, or open a file.

def finalize(self):
    Called once, when the BDSS is finalized. For example, to close the
    connection to a database, or close a file.

def deliver(self, event):
    Called every time the MCO generates an event. The event will be passed
    as an argument. Depending on the argument, the listener implements
    appropriate action. The available events are in the api module.

User Interface

Envisage Service Offers

A plugin can also define one or more custom visualization classes for the GUI application force-wfmanager, typically to either display data or provide a tailor-made UI for a specific user. In which case, the plugin class must inherit from force_bdss.core_plugins.service_offer_plugin.ServiceOfferExtensionPlugin , which is a child class of BaseExtensionPlugin. Any UI subclasses can then be made discoverable by force-wfmanager using the envisage ServiceOffer protocol through the get_service_offer_factories method

def get_service_offer_factories(self):
    """A method returning a list user-made objects to be provided by this
    plugin as envisage ServiceOffer objects. Each item in the outer list is
    a tuple containing an Interface trait to be used as the ServiceOffer
    protocol and an inner list of subclass factories to be instantiated
    from said protocol.

    Returns
    -------
    service_offer_factories: list of tuples
        List of objects to load, where each tuple takes the form
        (Interface, [HasTraits1, HasTraits2..]), defining a Traits
        Interface subclass and a list of HasTraits subclasses to be
        instantiated as an envisage ServiceOffer.
    """

Make sure to import the module containing the data view class from inside get_service_offer_factories: this ensures that running BDSS without a GUI application doesn’t import the graphical stack.

Custom UI classes

There are currently two types of custom UI object that may be contributed by a plugin: IBasePlot and IContributedUI. These interfaces represent requirements for any UI feature that can be used to display MCO data or a present a simplified workflow builder respectively.

Also, multiple types of plugin contributed UI objects can be imported in the same call. For instance

def get_service_offer_factories(self):
    from force_wfmanager.ui import IBasePlot, IContributedUI
    from .example_custom_uis import PlotUI, ExperimentUI, AnalysisUI

    return [
        (IBasePlot, [PlotUI]),
        (IContributedUI, [ExperimentUI, AnalysisUI])
    ]

Indices and Tables