Have you implemented OpenMRS? Please participate in the Implementation Site Survey. If you already have, thank you!
Skip to end of metadata
Go to start of metadata

Merge Concepts Module

Background

Duplicate concepts may be introduced into a concept dictionary through the introduction of new ways of modeling data, maintaining multiple concept dictionaries, or multiple users adding concepts. The goal of this project is to make it possible to update all transactional and master data referencing one concept that has been identified as a duplicate to reference the preferred concept.

Important Questions

Comments welcome!

How does the module handle observations and other references in a way that preserves data integrity?

When a concept is retired in favor of a new concept, its obs must be updated so that data is not invalidated. Data is preserved by retiring and recreating the obs. The now retired obs contains a message explaining the merge. The module also checks concepts' compatibility (see Concept Datatype Checking below) in order to prevent data loss through incompatible concepts being merged.

Can a merge be undone?

No. The user wanting to merge concepts should be confident the concepts are duplicates and familiar with the concept dictionary in general. Before the merge is executed, the user is provided with a preview of the data that will be affected by the merge and the options to return to the choose concepts page or continue.

What tables are impacted by the merge concepts module?

See: Tables Impacted by Merge Concept Module

Can a retired concept be merged?

Not currently. As long as the retired concept is not the "concept to keep," this could be considered for future iterations of the module.

How is the merge concepts module going to find other module tables referencing a concept?

The first version of MCM will not find module tables. For now, it will most likely publish an event to let other modules know about the update and it would be up to other modules to handle this. In a future iteration, it could be a viable option to create an interface for other modules to implement or to find such references via Hibernate or a text matching search and notify the user of those findings, but the module itself would not update the references.

What is special about concept answers?

If two concepts are used as answers to a question, and later determined to be duplicates, when the merge happens, that question will have two of the same answer. For example, "What is patient's favorite color?" has answers Red (concept id = 10), Blue (concept id=11), and Navy (concept id=12). Before the merge, there is a formfield with a unique formfield id for each of the three answers and there are three entries in the conceptAnswer table in the database. When concepts Blue and Navy are merged, the references to concept id 12 will be updated to 11, but there are still three formfields for the question "What is patient's favorite color?" and there are still three entries in the conceptAnswer table. This redundancy could also be an occur with drugs, programs, concept sets, person attribute types, and maybe others. The module will automatically delete duplicate concept answers because it is easy to determine if two entries have matching question and matching answer concept ids. All other possible situations like this will be highlighted in the log so the user can handle any other instances of this kind of duplication.

Google Summer of Code 2012 presentation materials

Presentation2.pptx

Here is the video that was made at the end of GSoC 2012 to demonstrate the module: 

 

Road Map

  1. Update API accessible concept references**
  2. Create a log of what has been updated and situations that may require further action by the user*

**Concept references:

  • obs 
  • drugs
  • orders
  • forms, fields, formfields
  • program, program workflow, program workflow state
  • person attribute types
  • concept answers
  • concept sets

Italics indicate some or all of the work has been done and now needs to be tested.

*Situations requiring further action by the user:

  • The first version of this module will update everything that can be accessed and identified as a concept id through the API. This means there are built-in methods of setting the concept for an object or there is a clear database schema for determining whether or not a column contains concept ids. The log will contain a list of all other possible references to concept ids, not including by other modules, found by the module via text matching search.
  • Updating concept ids creates other duplication issues. For example, if two concepts that were both used for programs are merged, it would be likely but not necessarily true that the programs were also duplicate, resulting in two unique program ids, referencing the same concept id. Another example is drug formularies, which are stored in the drug table. The user should be notified of updated tables where this might occur.
  • Some concept ids may be referenced within free text in the database, such as global properties, html forms, and serialized objects. Just because a concept id appears in a block of text does not mean it should be updated - it could be a patient id, drug id, or any other numeric value. The module should collect instances of the concept id in these tables and notify the user where to find them. The user will decide whether or not to update them manually.

Requested features:

  • Interactive updating for "hidden" concept references
  • Auto-convert similar datatypes (i.e. Date and Datetime)
  • Audit trail in a printable or savable format, in addition to log. Link on Admin page.
  • Combine choose concepts and preview pages into one interactive page
  • Suggest possible replacements for a concept
  • Merge multiple concepts
  • Search concepts by mappings

Technical Design

Technical Design

Concept Datatype Checking

The following error messages may be given for incompatible concepts to merge:

  • Concepts to merge must have the same datatype
  • Concept chosen to be retired has answers that the concept to keep does not have
  • Absolute high for concept to be retired is greater than absolute high for concept to keep
  • Absolute low for concept to be retired is less than absolute low for concept to keep
  • Chosen numeric concepts have different units
  • Chosen numeric concepts do not agree on precise (y/n)

Release Notes

New Features

  • coming soon

Known Issues

 
Resources

https://tickets.openmrs.org/browse/MCM-3

https://tickets.openmrs.org/browse/TRUNK-293 (CLOSED)

https://tickets.openmrs.org/browse/MCM-1

https://tickets.openmrs.org/browse/MCM-2

https://tickets.openmrs.org/browse/ITSM-3469

https://github.com/openmrs/openmrs-module-mergeconcepts

  • No labels

2 Comments

  1. Glen/Janet/Jordan- I think this is a great module for potentially helping to update data collection forms and for specifically identifying where a concept has been used before. This has always been a problem when modifying a concept. To know which forms use the concept would be a huge advantage, but I can imagine this would be difficult. For example, HTML forms and Xforms which only reference a concept in the UI might not be easy to find. Looking at the OBS table for form IDs associated with the concept might help at least find those which have been used. Identifying how often a concept has been used (by the Obs table) is also important.

    Some comments about the actions... first, it is a huge No, No to replace concept IDs for data which were collected in the Obs table. The Obs able with the datetime stamp is a medicolegal record of the encounter and editing it after the fact potentially is a problem. Typically, when a code would be retired in the past, you would use concept-concept links to capture that the code had changed and that the new code was such and such. Then when you were doing analysis, you could link to this table to dynamically update the concept code, but you would not change what actually was stored in Obs.  The forms and reports, however, could certainly be changed (although I would auto-increment the form version/ID, etc. so that it was clear that this is a new version of the form with a new concept ID included.

    This might be beyond a SOC project, but just wanted to let you know from the informatics point of view my concerns. 

    From a UI point of view... searching by name is probably not the best way to find duplicate concepts. I would encourage you to look at how maternalconceptlab.com does searching. The best way to determine dupes, might be to search for all concepts with the identical SAME-AS map to a reference code (for example, SNOMED). Another example, MVP may have identified several duplicates in the PIH or AMPATH dictionaries and mapped the new MVP/CIEL concept to several PIH or AMPATH concepts (so having more than 1 SAME-AS map to the same reference SOURCE would be a key). Just an idea...

    Andy

  2. Glen/Janet/Jordan, this is a great module and i'm sure will help a lot of people. 

    Following you description of it, it does not seem to consider patient data already introduced in the system as encounters or observations...

    I guess this is far from your first approach design, but i have to ask if you are thinking on including such a functionality because we have to make a "concepts migration" trying not to loose patient data... which seems quite tricky for now (just starting) ...

    thanks!