Image processing as an example domain for OTF computing offers a special challenge. Often, image processing services can only be described by so-called soft functional properties. Soft properties such as image noise or color distribution can be stored in an ontology in the form of concepts, but require a context-dependent evaluation that goes beyond the mere matching of service specifications. At first, learning methods were used for evaluation in exclusively static contexts and illustrated using the example of the composition of simple filter sequences.
The first step was the integration of learning procedures in the form of reinforcement learning (RL) into the evaluation process of configuration steps. The basis for the integration of RL techniques was the modeling of the configuration processes as sequential decision processes, which fulfill the Markov property (memoryless property). A sequence of configuration steps is seen here as a sequential selection of composition rules. A composition rule corresponds to a correct modification of the current composition structure.
Based on the composition rules, the sequential selection of configuration steps was modeled as MDP. States in the MDP describe the structure of the previously composed service consisting of terminal symbols (services) and non-terminal symbols (functionality to be configured). Actions, on the other hand, correspond to the application of composition rules. Different rules can be applied in the same state (alternative configuration steps), while identical rules can be applied in different states (context-dependent configuration steps). The evaluation of a configuration step corresponds to the quality value (Q value) of a (state/action) pair in the state space. The higher a Q value, the better the corresponding configuration step is suitable. Model-free, episodic RL in the form of Q-Learning and SARSA was used to determine the Q-values. Assuming a strict separation of configuration and execution, only the final feedback after execution of a configured service for feedback into the evaluation process and adaptation of the Q values can be considered. In the context of OTF computing, final feedback can correspond, for example, to a user rating after the service has been executed. If sample data is provided with the request, final feedback can also be generated automatically.
The approach presented can be understood as a sub-symbolic extension of the previously predominantly formal (symbolic) configuration methods. Ambiguities arising from abstract, formal specifications (user requests and service descriptions) during the configuration process can at least be resolved in static contexts. For example, composition rules are automatically generated and integrated into the MDP depending on the configuration algorithm used and the services found at configuration time. In dynamic state spaces (abstraction and concretisation of states) a generalisation of learned knowledge (state abstraction) can be advantageous in certain situations, e.g. to draw conclusions on similar user requests and to accelerate learning procedures.