Data Quality
Facilitator: Evan Waters
Notetaker: Christian Neumann
What makes "good data" quality? What are we trying to achieve and capture?
- What actually happened
- Accurate data
- Timely
- Complete
- Consistent
- Legible
- Used (usable) -> for decision making
- Reusable
- Relevant
- Necessary (only)
What are the challenges? What leads to poor data quality? What can get in the way?
- Too much data... (impacts the quality)
- All the inverse of good data quality
- Lack of understanding, training
- Process, capacity
- Lack of standards definitions
- Illegible
- Machine errors
- Incomplete and lost data
- Transcription errors
- Redundancies
What are the problems and possible solutions?
- Sensitize on data quality
- Overcome transcription errors with Education
- Measure impact
- Overcome Transit errors (time, distance) by reducing steps in process, being more timely, mobile platform
- Find answer for who is responsible for checking data. Everybody, not just data manager can check this
- Define categories of data quality areas and use different interventions to overcome the errors by different people
- Paper-based Forms can get lost; having 2 forms filled out by the responsible persons provides a fall-back in case the initial forms is lost or entered dfferently
- Gap of time and distance
- Community data 9more mobile) vs. clinical data (more statical)
- Complexity in forms
- Depending on environment patients might reuse Identifiers (to save costs, black market, privacy, ...)
- Problem of uniquely identifying persons, e.g. with Checkdigit in Identifier, Barcodes
- Name spelling, Soundex, birthdate, biometrics (fingerprint)
- Additional Identification by Secret questions & answers
- Tradeoff between Privacy and scaled-up National unique patient identification
What is "necessary data"?
- 1. Key logistics: like Drug box (stocks): How much meds have been used, how much is left?
- 2. M&E data, Data for founders
- Depends on the audience, e.g. Government, Facilities, ..., Task: Identify customers
- Minimum data set, task: Who defines minimum data set?
- Relationship information crucial for preventions, e.g. kids of an HIV mother
- Coordinating on nation level, harmonizing data sets
How do we measure "data quality"?
- Quarterly assessments, but who is actually doing these formal measurements?
- Validation against forms, but this needs access to the paper-based forms
- Look at completeness of data in the forms
- Random samples by data entry, e.g. check 20% of the data 3x a week
- Keeping log of errors for each data assistant
- Incentive for staff if error rate is low
What is already available in OpenMRS?
- DoubleEntryModule for Infopatch
- Patient Flags Module
- Reporting tools, which can be used for data quality
Things that we would like to see?
- Data Statistics module
- Data Integrity module
- "Pre-canned" rules for data quality
- Audit trail
- Double-entry for HTML & XForms
- Soundex module for fuzzy search in non-english languages
- Idgen
How do we continue?
Make some noise on
- OpenMRS Groups
- Tickets
- Wiki