Data Synchronization Create New Sync Node Project

<html><head><title></title></head><body>See the main Data Synchronization Project for more information about the background and breakdown of the OpenMRS Synchronization Project

GSoC 2009 Intern: Nzeyimana Antoine
Mentor(s): Maros Cunderlik and Ben Wolfe

PROJECT DESCRIPTION

<u>Abstract</u>

Data synchronization is a new OpenMRS feature allowing synchronization of data amongst a set of loosely networked servers. Such ability to exchange data is essential for operation of EMR system in rural areas where connectivity amongst sites maybe unreliable yet the need for timely centralized collection and analysis of data from remote sites exists.

One of the challenges in creating and maintaining the network of servers is the initial setup of 'child' node and establishing the relationship between parent and child such that no data is lost during the process. This process is today performed as a set of manual steps including:

  • register newly created child with parent
  • back up parent server DB and move the backup to the new child server
  • restore parent's DB
  • assign new server sync ID
  • change any server identifying information from parent to child (i.e. form entry server URL)
  • test sync connection between parent and child and finally establish periodic sync schedule
    As shown above, performing these steps requires administrative knowledge and access to the system resources such as DB login. What is needed is the user administration interface in OpenMRS that will allow system administrator to add new child node to the system without resorting to manual procedure as outlined above.

<u>Target</u>

Successful completion of the project would include:

  • Design and document the best programmatic approach to the manual process. Some considerations: What steps may be easy/hard to do in code? What is the best and most reliable way to 'clone' database from 'active' web application?
  • Design the appropriate user interface: identify and document user interactions; design web screen for the admin user interface.
  • Create project approach utilizing Agile software development methods: define user stories to document design, create task backlog, plan sprints, and deliver feature in several increments.
  • create appropriate unit tests using standard OpenMRS unit testing approach, see Unit_Testing_Conventions and Unit_Testing
  • for each user story create automated behavior and/or integration tests thus providing minimally one-to-one integration test coverage for each user story

<u>Extra Credit</u>

At times, it is also necessary to physically bring existing child node that may have been offline, or otherwise unavailable back onto parent's network. In those cases, what is generally needed is to 'send' any data entry performed offline to parent and then rebuilt the child database from 'scratch'. The solution developed to to create new child would ideally support this need also.

DESIGN & DOCUMENTATION

<u>Manual process of cloning a DB from an active web application</u>

The manual process of sync child creation involves many steps including:

  • create the child user account at the parent installation
  • backup the parent DB
  • restore the parent DB at the child
  • assign a new server sync ID and change any server identifying information
  • test sync connection between parent and child and finally establish periodic sync schedule

The parent DB backup and restore are the most difficult part of the process since there is an admin UI for the rest of steps .

*DB backup(cloning parent DB)
There are many methods for Mysql database backup but the most widely used is the mysqldump command line program .
The mysqldump program comes along with Mysql installation and it has many options .
The documentation about mysqldump program can be found in mysql reference manual [.
The most common uses of mysqldump are :

Backup the whole database on a ddl file(dump file)
<code lang="sql">
mysqldump –-user user name] –-password=password database name > dump file

or

mysqldump –u user name –p password database name > dump file
</code>
example
<code lang="sql">
mysqldump –-user root –-password=rootpassword openmrs_sync > openmrs_sync_backup.sql

or

mysqldump –u root –p rootpassword openmrs_sync > openmrs_sync_backup.sql
</code>
Backup a specific table in the DB
<code lang="sql">
mysqldump –-user user name –-password=password database name table name > dump file

example

mysqldump –u root –p rootpassword openmrs_sync synchronization_journal > /home/user/Desktop/openmrs_sync_journal_backup.sql
</code>

*DB restore(child DB setup)
The mysqldump utility is used only to take the MySQL dump. To restore the database from the dump file that you created in previous step, use mysql command .
<code lang="sql">
mysql --user username --password=password database name < dump file

example

mysql --user root --password=myrootpassword openmrs_sync < /home/user/Desktop/openmrs_sync_journal_backup.sql
</code>
After the database restore the administrator proceeds to other steps concerning the child-parent configuration .

<u>Automated process</u>

Using mysqldump command-line manually to backup the parent database needs to be carried out carefully.
There are even some tables that are supposed to be skipped because we don't need to export them to the child.
Dumping each table manually sometimes become a hassle leading to errors.
This automated process will automate the whole process of cloning the parent DB and restoring it on the child side.
Fewer steps are required now and everything will be done via the sync admin UI page .

Required functionalities:

Parent capabilities:

  • register the newly created child
  • backup the database in a customized way so that some tables or data bay be skipped
  • write the clone script to a file or to an HTTP response
  • export the DB backup script file to download for off-line restore at the child side

Child capabilities:

  • request the parent DB backup
  • restore a the parent DB backup from an HTTP response
  • import the parent DB and restore it via an uploaded file
  • ability to edit/change parent identification

Implementation process

  • The DB dump and restore are done by SynchronizationHibernateDAO class via JDBC or by calling mysqldump/mysql command-line programms.
  • Two JDBC methods: exportChildDB(OutputStream out) and importParentDB(InputStream in) in the Sync DAO layer are exposed to the sync service layer
  • Two JDBC methods: generateDataFile(File) and execGeneratedFile() in the Sync DAO layer are also exposed to the sync service layer
  • These methods can be called via Spring Controllers, Servlets or DWR Sync Service to make the process easier .
  • The exported/imported database DDL files are stored in OpenMRS local directory first.
    *JDBC vs. mysql/mysqldump methods: the process of dumping and restoring the database can be done by either JDBC methods (J methods) or by the methods that calls mysql/mysqldump command programs; the results of these methods are fully identical but there are some points to put into consideration for choosing which one is more efficient .
    • The mysql/mysqldump methods (M methods) are faster and less expensive(CPU/memory) because there are just calls to the operating system to execute commands
    • The M methods are depend on the fact that mysql bin directory is in the path
    • The J methods are cross-platform and DBMS-independent
    • Any DBMS migration from Mysql will make the M methods invalid

<u>Admin UI Screen Design for the automated process</u>

The sync settings page has a link for adding or setting up the parent DB, this will lead to the child node creation or editing .
This user interface has four main parts:

  • Parent server connection setup: address, credentials and connection testing button
  • Parent DB cloning: the OpenMRS system administrator can clone the parent DB via web or uploading a backup file downloaded from the parent installation
  • Sync schedule: this helps to set up automatic synchronization between parent and server after every amount of time specified here
  • Saving changes: After setting up everything, the changes needs to be saved

OpenMRS Data Synchronization: create a new sync node, admin UI .

OpenMRS Data Synchronization: setup a new sync child and be able to download data backup file for offline restore.

PROJECT GOALS

<u>Mid-term goals</u>

  • Manual process documentation included in the help page
  • Merge the data-synchronization-admin-ui branch from data_synchronization_bidirectional branch(1.4.x data model)
  • Develop the automated process including:
    • Ability for the parent to export the backup of the database via the web(HTTP) response(ConnectionResponse class)
    • Ability for the parent server installation to download a backup file of the database
    • Child can import the parent db backup via the web
    • Ability for the child installation to upload a parent db backup file
    • Ability to store the backup files in the OpenMRS local directory
  • Design and implement the administrative user interface for this automated process
  • Document the automated process in the help page and define use cases(i.e. automated web cloning, backup files, download and upload the backup file)

<u>Final Goals</u>

  • Unit and integration testing
  • Finish the documentation and and help page for all use cases
  • Clean up, update and finish the Sync Admin UI where necessary(e.g. Help & Documentation)
  • 'Project-wise' code commenting and licence inclusion
  • Modify the Admin UI for the user to be able to select which tables to dump in the data backup similar to the way he can to select which tables to sync

SOURCE CODE

The project codes can be browsed [ HERE|http://trac.openmrs.org/browser/openmrs/branches/data-synchronization-admin-ui].
This source code is available for free under the terms of OpenMRS Public Licence .
You can do a checkout using the following command :
<code lang="sql">svn checkout http://svn.openmrs.org/openmrs/branches/data-synchronization-admin-ui</code>

OTHER IDEAS AND COMMENTS

We welcome everyone to leave any useful feedback

— Antoine

</body></html>