MVP Analysis Environment (Physical Architecture for POC Deployment)

The following plans outline the host environment for the MVP proof of concept data, data models, analysis models and delivery tools.

Physical Architecture

The physical architecture can be divided into two realms: the resources under VPN protection (encrypted communication between all components) and perimeter network resources, or those resources that are available to the external networks via authorization.

VPN Protected Resources

The following resources are under VPN protection and have a restricted number of authorized users that can administer or access dependent data, configuration and deployment administration. If you need access to these resources, please contact Former user (Deleted).

OpenMRS Schema

The idea today is to execute jobs and transformations directly against the OpenMRS schema on a scheduled basis. This as a production means of warehouse updates may or may not be practical and will be reviewed as the project progresses. The alternative would be to introduce a preliminary staging step before processing the data at the Pentaho Data Integration Server. 

Pentaho Data Integration Server (Pentaho Carte)

The Pentaho Data Integration Server is an ETL server that can run scheduled, continuous and/or parallel jobs and data transformations. The PDI Server has an in-built scheduler to automate updates to the warehouse schema, and a monitor to analyze the processes for performance tuning.  This is the component that will execute the processes developed to transform the OpenMRS data into a schema that is conducive for use in analysis activities.

Data Staging Area

The common staging area for the PDI Server will be a MySQL server. This staging area will accommodate any interim processes that operate on the data, and hold the necessary configuration and process metadata tables.

DMZ (Perimeter Network) Resources

The perimeter resources are accessible from external networks via successful authorization by the user attempting to gain access. The resources that reside in this area have the proper authorization concerns built into the tools to provide sufficient security for both the data and solutions housed here. Granting users access to this area will be more liberal, as this is where solutions will be delivered to end users, presumably members of the OpenMRS and Columbia University teams.

MVP Data Warehouse

LucidDB was chosen as the open source RDBMS of choice for the warehouse schema for this project because LucidDB is built specifically with warehousing in mind, and it is a robust tool. In the end, the RDBMS can be any flavor that the teams prefer, as the solutions that populate this side of the schema are abstracted from the RDBMS connection details, and the transport relies only on what standard JDBC allows (unless we introduce FTP, HTTP services, etc.).  Should we use a LucidDB specific series fo steps (for example, the LucidDB Bulk Loader step in PDI), the decision will be noted and justified as a dependency.

Pentaho CE BI Server with Saiku

The Pentaho CE BI Server houses the Mondrian OLAP engine, which will execute the analysis queries and interpret the analysis models that are built for the MVP data. Saiku is the open source analyzer tool that is presented to the end user for creating and issuing queries to Mondrian with a simple, intuitive drag and drop interface.

Delivery Tools / Consumer Interfaces

Saiku is the tool of choice for this project for interacting with Pentaho analysis models. Pentaho User Console (PUC) is the primary front end to the Pentaho BI Server, and will be used for authorizing users to the server. Saiku can be deployed as a plugin to to the Pentaho BI Server and launch directly from PUC. We may also take advantage of some of the interactive reporting tools that are embedded within PUC, should the need arise.