Project Work Plan


GSOC 2014 - OpenMRS 
Project Task: Blocking Algorithm for Patient Registration
Mentor: Judy Wawira

 Project Work Plan

 Phase 1 - Study of String Matching Techniques and understanding Attributes of Patient Registration Module

- Setting up the OpenMRS platform on personal machine.
- Understanding the Patient Registration module and the patient attributes captured.
- Study of N-Gram based string matching technique and evaluating its feasibility for OpenMRS patient registration data matching case.
- Read about blocking algorithms and state of the art development and implementations in patient matching module viz., probabilistic matching, name swap analysis etc.

Phase 2 - Randomly generate Dummy Patient Data

- Write a program to randomly generate data using Python programming language.
- Specify a module in the program to add duplicate/missing/permuted values to randomly selected fields to the generate fuzzily matching data.
- The data redundancy are controlled by user input parameters which are stored along with generated data. Multiple samples can be generated.
- Randomly generate 0.1 million number of such records.
- Add module for randomly generating valid latitude, longitude and/or biometric data in addition to Patient's ID, initials, first name, middle name, family name, gender, date of birth / age, address {house number, address1, address2, pincode, country}.

Phase 3 - Preprocessing data - data cleaning and data formatting

- Read the data in R and generate the summary
- Script to clean the data and transform the data according to fields specified in configuration files.
- Load the pre-processed data into database or file for further operations

Phase 4 - Developing Blocking Algorithms/ and Techniques
- Evaluate the pros and cons of multiple blocking schemes.
- Develop a blocking scheme and implement the same in Python/Java and R
- Run the blocking algorithm against the generated data (phase 2 & 3) - on multiple samples.
- Validate results of the similar matched patients records found against the original distribution of the sample to check for false positives and accuracy.
- Iterate and improve blocking algorithm to get desired efficiency and accuracy of the results.

Phase 5 - Integrate solution with the Patient Registration Module
- Integrate the blocking algorithm with patient registration module.
- The integration task my include task of porting the blocking algorithm to the programming environment of OpenMRS platform by either creating an interface or rewriting the algorithm/logic.

Phase 6 - Test and Bug Fixing
- Test the new modified patient registration module to see the working of solution in action.
- Resolve and fix the bugs if any.

Phase 7 - Commit and publish the code on OpenMRS repository
- After successful testing and bug fixing and as per the guidelines of the mentor, proceed to commit the code changes to the github repository of OpenMRS.
- Provide a talk or demo at the OpenMRS developer forum regarding the new feature in the patient registration module.