OpenMRS currently has available a set of demo data that was created by de-identifying patient a subset of data from existing implementations and duplicating several times over. This data set was developed and curated manually and is difficult to re-create. It would be incredibly helpful to be able to create more comprehensive & realistic data sets for developers and researchers to use.
Make patient data truly de-identified requires following some fairly stringent rules. For example, any dates associated with a patient (including birthdate, visit/encounter dates, dates of observations, etc.) may only use the year (not month or day), so the timestamps on patient data would need to be randomized enough to satisfy HIPAA rules but without losing the sequence of results so the trends of results could remain relatively realistic. Simply shifting all timestamps by the same amount would not meet HIPAA requirements, since the intervals between tests could be used to re-identify the patient. Basically, creating truly de-identified data means creating a dataset that even a team of expert statisticians could not use to establish the identity of any of the patients.
This project would develop an OpenMRS module capable of transforming and exporting data that adheres to HIPAA privacy guidelines.
- Create an OpenMRS module
- Successfully export a patient's de-identified data (replacing demographics with suitable substitutes and randomizing dates of visits, encounters, observations, etc.)
- Create a process that can export many/all patients' data from a system in de-identified format.