Merge Concepts Module
Duplicate concepts may be introduced into a concept dictionary through the introduction of new ways of modeling data, maintaining multiple concept dictionaries, or multiple users adding concepts. The goal of this project is to make it possible to update all transactional and master data referencing one concept that has been identified as a duplicate to reference the preferred concept.
How does the module handle observations and other references in a way that preserves data integrity?
When a concept is retired in favor of a new concept, its obs must be updated so that data is not invalidated. Data is preserved by retiring and recreating the obs. The now retired obs contains a message explaining the merge. The module also checks concepts' compatibility (see Concept Datatype Checking below) in order to prevent data loss through incompatible concepts being merged.
Can a merge be undone?
No. The user wanting to merge concepts should be confident the concepts are duplicates and familiar with the concept dictionary in general. Before the merge is executed, the user is provided with a preview of the data that will be affected by the merge and the options to return to the choose concepts page or continue.
What tables are impacted by the merge concepts module?
Can a retired concept be merged?
Not currently. As long as the retired concept is not the "concept to keep," this could be considered for future iterations of the module.
How is the merge concepts module going to find other module tables referencing a concept?
The first version of MCM will not find module tables. For now, it will most likely publish an event to let other modules know about the update and it would be up to other modules to handle this. In a future iteration, it could be a viable option to create an interface for other modules to implement or to find such references via Hibernate or a text matching search and notify the user of those findings, but the module itself would not update the references.
What is special about concept answers?
If two concepts are used as answers to a question, and later determined to be duplicates, when the merge happens, that question will have two of the same answer. For example, "What is patient's favorite color?" has answers Red (concept id = 10), Blue (concept id=11), and Navy (concept id=12). Before the merge, there is a formfield with a unique formfield id for each of the three answers and there are three entries in the conceptAnswer table in the database. When concepts Blue and Navy are merged, the references to concept id 12 will be updated to 11, but there are still three formfields for the question "What is patient's favorite color?" and there are still three entries in the conceptAnswer table. This redundancy could also be an occur with drugs, programs, concept sets, person attribute types, and maybe others. The module will automatically delete duplicate concept answers because it is easy to determine if two entries have matching question and matching answer concept ids. All other possible situations like this will be highlighted in the log so the user can handle any other instances of this kind of duplication.
Google Summer of Code 2012 presentation materials
Here is the video that was made at the end of GSoC 2012 to demonstrate the module:
- Update API accessible concept references**
- Create a log of what has been updated and situations that may require further action by the user*
- forms, fields, formfields
- program, program workflow, program workflow state
- person attribute types
- concept answers
- concept sets
Italics indicate some or all of the work has been done and now needs to be tested.
*Situations requiring further action by the user:
- The first version of this module will update everything that can be accessed and identified as a concept id through the API. This means there are built-in methods of setting the concept for an object or there is a clear database schema for determining whether or not a column contains concept ids. The log will contain a list of all other possible references to concept ids, not including by other modules, found by the module via text matching search.
- Updating concept ids creates other duplication issues. For example, if two concepts that were both used for programs are merged, it would be likely but not necessarily true that the programs were also duplicate, resulting in two unique program ids, referencing the same concept id. Another example is drug formularies, which are stored in the drug table. The user should be notified of updated tables where this might occur.
- Some concept ids may be referenced within free text in the database, such as global properties, html forms, and serialized objects. Just because a concept id appears in a block of text does not mean it should be updated - it could be a patient id, drug id, or any other numeric value. The module should collect instances of the concept id in these tables and notify the user where to find them. The user will decide whether or not to update them manually.
- Interactive updating for "hidden" concept references
- Auto-convert similar datatypes (i.e. Date and Datetime)
- Audit trail in a printable or savable format, in addition to log. Link on Admin page.
- Combine choose concepts and preview pages into one interactive page
- Suggest possible replacements for a concept
- Merge multiple concepts
- Search concepts by mappings
Concept Datatype Checking
The following error messages may be given for incompatible concepts to merge:
- Concepts to merge must have the same datatype
- Concept chosen to be retired has answers that the concept to keep does not have
- Absolute high for concept to be retired is greater than absolute high for concept to keep
- Absolute low for concept to be retired is less than absolute low for concept to keep
- Chosen numeric concepts have different units
- Chosen numeric concepts do not agree on precise (y/n)