Message-ID: <851419158.4247.1462423854850.JavaMail.confluence@gw81> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_4246_1512390124.1462423854838" ------=_Part_4246_1512390124.1462423854838 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html
wn User (surangak)
wn User (sgrannis)
A lucky student
Duplicate patient records often arise in electronic medical record syste= ms. These duplicates cause fragmentation of a patient records and hinder ac= cess to seamless integrated patient data. The PatientMatching module = is a tool that helps OpenMRS installations to identify and merge duplicate = Patient records arising within the OpenMRS database. The PatientMatchi= ng module has been incrementally developed over the last few years by a coh= ort of Google Summer of Code interns, systems engineers, and Medical Inform= atics Researchers. Our hope is that GSoC 2012 will see continued succe= ss in evolving the module's functionality.=20
This objective of the 2012 GSoC de-duplication module project is to impr= ove the user experience with the de-duplication workflow. The GSoC applican= t will help to analyze, design, develop, and implement a number of pri= oritized features, which include:=20
Task 1: Incorporate a process to validate de-duplication strateg=
Configuring a de-duplication strategy to find potential= duplicates is a moderately complex task. If configured incorrectly, the de= -duplication process may fail, or the linkage results may be inaccurate. To= ensure a properly configured linkage approach, we will incorporate a valid= ation process that highlights errors in the linkage configuration. Examples= of invalid matching configurations include undefined blocking field(s), un= defined matching fields, etc.
Task 2: Incorporate a process to calculate total number of poten=
tial pairs formed by particular blocking strategy.
The de-du= plication module searches through "record pairs" that have a high= likelihood for being duplicates. "Records pairs" are formed usin= g "blocking strategies", which are simple approaches to finding s= imilar records by requiring that one or more corresponding fields exactly a= gree among 2 records. For example, we may stipulate that records agreeing o= n last and first name should be used to create potential pairs. Occasionall= y, however, the user may choose a blocking strategy that results in very la= rge (e.g., hundreds of millions, if not billions of record pairs), or very = low (zero to less than a hundred) numbers of record pairs. Widely var= ying numbers of record pairs can results=3D in unexpected results, includin= g out-of-memory errors, excessively long runtimes, confusing or= inaccurate results, etc. To avoid this situation, the GSoC intern will imp= lement a function that calculates the estimated number of pairs to be forme= d, and alerts the end-user to that number. The end user will then have the = option of canceling or editing the particular de-duplication strategy.
Task 3: Upgrade the de-duplication reports from flat files to da=
The de-duplication module creates r= eports listing potentially duplicate records, which end-users can manually = review and merge when necessary. Until recently, these "de-duplication= reports" were stored as flat files. Unfortunately, flat files limit o= ur ability to manage the data and hinder new creative ways to display the d= ata. Therefore, upgrading from flat files to persisting the data in a relat= ional database will help users and developers more meaningfully use this da= ta. The successful applicant will continue the previously initiated work fo= r this task. For a detailed description of what we have completed so far, a= nd for more hints on how to complete this ticket, see here.
Task 4: Implement a process to analyze and highlight useful de-d=
Data fields in OpenMRS often *appear* to = be useful for de-duplication, but are not. This can be the case for a varie= ty of reasons: data may be incompletely or inaccurately recorded, some fiel= ds may simply lack the discriminating power to be meaningfully used as matc= hing variables, etc. To rapidly identify data fields that optimally support= de-duplication, we've developed data quality and information content metri= cs that characterize the usability of fields specifically for use with de-d= uplication. This information can help guide the de-duplication user when se= lecting specific fields for duplication strategies.
Task 5: Implement additional duplication features in the OpenMRS=
The OpenMRS PatientMatching module a= lso implements a standalone record matching and de-duplication application = called "RecMatch." Written as a Java Swing application, &qu= ot;RecMatch" offers expanded de-duplication features and functionaliti= es beyond what currently exists in the de-duplication module. We aim to inc= orporate a subset of these functions in the de-duplication module's web int= erface.
We understand that the student may have a hard time getting up to speed = on some of these tasks. Therefore, they will start with well-scoped, lower-= complexity tasks. The student will move onto more complex tasks after achie= ving an acceptable level of proficiency, and will ideally complete addition= al tasks as time permits.=20
Task 6: Migrate previous reports from the flat file to the datab=
Once Task 3 is completed, all reports will be persisted = in the system database. However implementations already using the de-duplic= ation module may have older reports currently persisted as flat files. In o= rder to prevent further ambiguity, the PatientMatching module should provid= e a run-once operation that imports the flat file and persists older report= s into the database as well. This operation should be incorporated into the= latest PatientMatching release that contains the database changes. When th= e user installs the PatienMatching omod it will search for previous "f= lat-file" reports and move them into the database, rendering the flat = file obsolete. This ensures that older report data can also be managed effi= ciently via the database.
1. Applicant should be familiar with Apache maven, Spring framework, Hib=
ernate and JSP
2. Applicant should be very proficient in java
3= . Applicant is expected to spend spend some time understanding ho= w the PatientMatching module works.
4. The initial ticket for task nu= mber 3, PTM-47
4. Feel free to contact us at surangakas at gmail dot com or sgrannis at= regenstrief dot org for further clarifications