The MIF middleware platform is designed for building applications based on data processing pipelines that integrate various software components using an asynchronous messaging platform. The figure below shows a really simple example of a processing pipeline with two analysis components.
High performance, reliable middleware platforms are tricky and expensive things to build. Therefore we have built the MIF as a layer on top of existing open source Java technologies, namely the Mule ESB, a Java cache (ehcache) and a Java Messaging Service (JBoss JMS). The MIF basically defines a component-based programming model for building application pipelines, and implements this programming model by exploiting the underlying open source platform capabilities. This has the very desirable effect of creating a highly reliable foundation for the MIF technology, as all the open source technologies we are using are robust and have been tested in many demanding application.
The resulting MIF architecture is depicted in the figure below.
MIF analysis components are constructed using a Java API that supports intercomponent communication using asynchronous messaging. Local components execute inside the MIF container. Remote components support the same programmatic API, and utilize additional MIF facilities to execute component code outside the MIF container. Remote components are used to create distributed solutions and to integrate with non-Java codes.
Messages can be passed between components by copy or by reference. Copying occurs when a message contains a data payload, and reference passing occurs when the payload contains a reference to a data item in the MIF cache. Cache items are held in the MIF container memory, and the cache will overflow to disk when allocated cache memory is exhausted. By default, the MIF container mechanism for exchanging messages between components is implemented using ordinary Java method calls. Messages can also be exchanged using the Java Messaging Service (JMS) to enhance scalability. This is simply achieved by configuring the endpoints used by components for communication.
The MIF API enables components to optionally record metadata (known as provenance) about the data they receive/produce and the processing carried out. The metadata is passed on to the MIF container, and this sends a JMS message to a ProvenanceListener topic. How these messages are subsequently processed is defined by the MeDICi Provenance tools.
The MIF container environment is provided by Mule, an open source messaging platform. The MIF extends the Mule API to make component pipeline construction simpler and to create an encapsulation mechanism for component creation. Our current implementation uses ehcache and the JBoss JMS, but the MIF API is designed as agnostic to the underlying cache or messaging platform. This allows deployments to configure MIF applications using specific technologies that meet their quality of service requirements. In addition, if the cache, provenance and JMS options are not used in a MIF application, then the application does not need these platforms to be deployed, thus releasing resources within the MIF container.
The key concepts you need to understand to use the MIF API are explained below:
- MIF processing pipeline: A processing pipeline is a collection of analysis components connected as a directed graph. Data arrives as input at the first component in the pipeline, and the output of this component is passed to one or more downstream components for further processing. A pipeline can be of arbitrary length, and components can accept inputs from one or upstream components, and send their results to one or more downstream components.
- MIF analysis components: At the core, a MIF analysis component is a user supplied piece of code that inputs some defined data, processes it, and outputs some results. This code is wrapped by the MIF API to make it possible to plug the component in to a processing pipeline.
- MIF container: The MIF API executes in the MIF container within a Java Virtual Machine. The MIF container is provided by Mule, and optionally a Java data cache and Java Messaging Service.
- Local components: Local components are Java analysis codes that are wrapped in the MIF API. Local components execute within the MIF container.
- Remote components: Remote components execute outside the MIF container, either on the same machine or on another node in a distributed application. the core analysis code can be Java (e.g. an EJB or MDB), a Web service, HTTP server, a socket server or an executable written in any programming language. The MIF remote component API wraps the analysis code and provides the capabilities to communicate with it from a MIF pipeline.
- Messages: Analysis components pass messages down the pipeline as the incoming data is progressively processed. Complete messages can be simply copied between components, or a reference to the message payload (stored in the MIF cache or addressed by some URI) can be exchanged to reduce the overheads of passing large messages.
- Metadata: Also known as provenance, MIF analysis components can be configured to asynchronous send metadata to the MIF provenance store.