Historically, the The MVP-CIEL Concept Dictionary has been delivered to the community through Dropbox and SQL scripts that require implementations to overwrite their existing dictionary. The process of applying updates is manual, cumbersome, and does not allow for local concepts to exist with CIEL concepts. Open Concept Lab has been an invaluable resource for sharing CIEL and other community vocabularies as well as the mappings between them. As the OCL team is working on a "2.0" vision of the open concept lab that is much more GitHub-like in how dictionaries & collections of concepts are managed, KenyaEMR is looking for a better and more sustainable solution for using the CIEL dictionary and applying changes over time.
user-4e734 representing priorities & strategy for OCL
Steven Wanyee representing priorities & strategy for KenyaEMR
Objectives
Milestone 1 - Let KenyaEMR's Dictionary server subscribe to CIEL via OCL, while also creating local concepts
Requirements and Assumptions:
An OCL service will exist in the cloud, that supports fetching a (large) set of concepts through a web service
It has a built-in JSON format. We may request a specific format.
There will be one Kenya concept server on the internet.
Dealing with multiple clients is out of scope for this milestone
Pushing the merged Kenya dictionary to hundreds of Kenya servers is out of scope
Implementation can subscribe to one concept sources (initially, Kenya will subscribe to all of CIEL)
Eventually we need to support subscribing to a subset exposed by OCL
Eventually we need to support subscriptions to multiple sources
Kenya should be able to download monthly updates from CIEL
Implementation can also create local concepts
We must assume that these may have conflicts (e.g. duplicate names)
Assumption: implementation should not edit concepts that have been downloaded from a subscription
If they do, it is fair for us to overwrite these changes next download
Implementation of OpenMRS must be able to access OCL over the network to update the dictionary. There will not be a way to get updates using e.g. a zipped file.
Current status:
The Open Concept Lab module has been released as 2.0-beta and is currently being tested.
Import the full CIEL dictionary into KenyaEMR distribution: import completed in 564 minutes 14 seconds
We chose to use OCL API and format provided by OCL instead of requiring from OCL to return format defined by OpenMRS. This implies that on the OpenMRS side we have to create concepts from data provided by OCL and we will be responsible for adjusting the conversion in case any changes to OCL API are made.
Concepts from different sources may not conform all OpenMRS validation rules for example they can have duplicate names across sources. Our approach is to import new concepts and updates that are valid against a local database and list the rest for manual fixes. Fixes must be done on the OCL side for concepts coming from OCL and in the local database for local conflicting concepts. It also means that if a valid concept references an invalid concept as an answer, the valid concept will not be imported as it cannot be fully created.
There is an open question how to handle cases when the subscription contains concepts with same names from different sources e.g. CIEL:HIV and Kenya:HIV. It was suggested that when creating a subscription you could define that CIEL concepts take precedence over Kenya concepts. The problem is that Kenya:HIV can be referenced by other concepts from the Kenya source and it would have to be replaced by CIEL:HIV. That means the meaning must be exactly the same. They must have (1) the SAME-AS mapping and I think that (2) datatype and (3) class must match as well. This implies that if we discover concepts with same names that do not comply with (1), (2), (3) they will have to be fixed on the OCL side. I'm not sure if it's always possible.
Concepts will never be removed by subscribers as they can be referenced already. OCL can only retire them. There must be a retired flag in concept representation and any concept metadata representation (classes, sources, etc.) in OCL.
Database IDs assigned to imported items will be different for every subscriber, but will remain the same for every subsequent update.
Imported concepts will not be merged in any way with locally created concepts or concepts imported from other sources. If there is a local concept with the same UUID as in OCL, it will be overwritten by the version from OCL when imported. It also means that If there is a local concept with the same name as in OCL, you will not be able to import that concept from OCL unless you retire the local concept (to fix duplicate names).
Classes, Datatypes and Sources created locally will be overwritten by those in OCL if they match by name.
You can only subscribe to one URL.
Implementation details
OCL API (suggestions)
Approach without the RSS feed:
OCL API must expose a subscription URL that lists all (including retired) concepts, sources and mappings in json to be included in the subscription. The assumption here is that we don't have to do extra REST queries to get sources nor mappings and we can find all what is needed in the returned json. The returned JSON must have itemsNumber and updatedBeforeDate fields. The updatedBeforeDate field needs to be set to the value passed in the updatedBeforeDate URL parameter or the date of the server that the json was requested if the parameter was not specified.
The URL needs to accept pagination parameters page=1&per_page=50 (items ordered by uuid for example) and updatedAfterDate=2014-08-13 18:34:23.2314 and updatedBeforeDate=2014-08-13 18:34:23.2314 to display only items updated in the specific time period. If page or per_page is not included use defaults: page=1&per_page=50.
Alternately to b. we could get rid off paging and get a zipped json with all items, which I think should be the preferred solution.
Approach with the RSS feed, which as Darius says is more in-line with web conventions:
The RSS feed exposed under a different URL should contain links to updated resources with dates when they were updated so that the client can request the feed and fetch items one by one using resource URLs pointing to specific versions of resources.
For the initial import we still need REST calls described in 1. except for updatedAfterDate support.
OCL examples need to be updated to reflect the current state (include missing fields: retired, uuids, ...?)
OpenMRS module
The module will support 1.9.8+
The module needs to fetch all items from the subscription URL (omitting updatedAfterDate parameter) for a newly added subscription. The date when the items were fetched needs to be stored to query with the updatedAfterDate parameter next time.
The module will use a scheduled task to query for updates periodically, importing new items and recording updatedBeforeDate each time.
The module can store a subscription URL and updatedBeforeDate in global properties.
We will use Jackson library to parse json and RestTemplate from Spring as the REST client.
We may need to find a way to handle pagination when saving items. It may happen that an answer to a concept is on a different page than the question. The solution depends on how answers/sets are represented in the concept resource in OCL.
It may be inefficient to call OpenMRS services to save changes for ~70k CIEL concepts (it used to slow down MDS imports as well). We may need to investigate how to improve the performance of the save method if we want to use that. The most problematic used to be the concept validation and looking for duplicate names using partial matches (skipping db index), which is hopefully fixed in recent changes (see the recently introduced isConceptNameDuplicate). We may need to back port that to 1.9.x.
Reporting validation errors in core needs to be improved. We will need messages to be clear to list items for manual fixes.
User stories:
Adding subscription
Administrator enters OCL subscription URL on the Subscription Status page.
Module fetches the first page of results (it doesn't pass updatedAfterDate nor updatedBeforeDate parameters) to check if there are any items and notifies the administrator "Updates to dictionary available".
Administrator clicks the "Update dictionary" button, which gets disabled and text replaced to "Update in progress" (see the update process)
Administrator can see the progress: number of items updated / number of updates and a list of validation errors if any.
Checking subscription status
Administrator enters the Subscription Status page.
If the subscription is in progress it should see the status "Update in progress" and details.
Manual checking for updates
Administrator opens the Subscription Status page and sees the "Check for dictionary updates" button.
Administrator clicks the "Check for dictionary updates" button and if there are any updates there's a notification "Updates to dictionary available" and the button changes to "Update dictionary".
Automatic updates
Administrator opens the Subscription Status page and can enable checkbox "Update dictionary automatically".
It shows "Update dictionary every x days at y", where x is the number of days to set and y is the time when the update should happen.
Update process (in background for paginated json)
Module fetches the first page of results and stores the updatedBeforeDate value returned from the server.
It saves all concepts, sources and mappings from that URL and updates the progresss on the Subscription Status page.
Module continues to fetch next pages passing the page and updatedBeforeDate parameters.
If there are validation errors module needs to add the issues to the list on the Subscription Status page.
If there are validation errors remaining after previous import and not fixed by the new import, the module needs to query for each invalid item individually to see if it can be imported now (i.e. the problem was fixed locally).
Fixing validation errors
After invalid items are fixed an administrator can click "Check for dictionary updates and fixes"
If any updates are available there's a notification "Updates to dictionary available" and the button changes to "Update dictionary"
If no updates are available there's a notification "Updates not available, last time checked: date" and the button "Update dictionary after local fixes"
Questions
Classes are not listed as resources in OCL API. They are simple strings in the concept representation. Is that ok from OpenMRS perspective?
I don't see answers nor sets in the concept representation in OCL examples. How are concepts of datatype answer or set represented?
There was a suggestion to use an RSS feed to query for updates. It adds yet another input format to parse. What are the benefits of having an RSS feed instead of a simple REST call?
We base the subscription on querying for concepts changed in the specified period of time. This has the disadvantage that someone can get a concept that is being rearranged and not yet in its final state e.g. concept manager is halfway adding answers to a concept. Is that ok? If not we need to change the approach and expose only published changes. It can be achieved in a few ways, which can be discussed once we decide there's a need for that. Andy's answer: The current intent is to have CIEL continue to manage concepts, then publish to OCL and then move to Kenya EMR. Although there might be subsetting going on in OCL, there should not be a lot of temporary concept states that would confuse the ATOM feed.
From our conversation, Darius Jazayeri and I estimate (from the "double it and double it again" method) this project will require two dedicated external developers + one dedicated OpenMRS developer (presumably, Rafał) two two-week sprints to get this done. This assume the OCL side (web service endpoints) is already done.
Is this something that the Metadata Sharing Module could do? If not, why not? Would our effort be better spent improving the Metadata Sharing Module instead of creating another module to share metadata?
In short complexity behind MDS, which supports all metadata in the system and resolving conflicts at import time, results in too big memory imprint and time inefficiency to share the whole dictionary. It was simply not designed to do that and I can only think of a way to fix things by rewriting MDS internals, which would also be a much bigger time investment than supporting just the concept dictionary use case.
In addition OCL doesn't run OpenMRS as back-end so it can't export MDS packages.
There are a lot of references above to concept names. I believe the requirement is that the same name cannot be preferred in the same locale. Just want to be clear that it should be possible to import duplicate concept names in different locales. (Or assume that you are including locale in the definition of duplicate name).
The current intent is to have CIEL continue to manage concepts, then publish to OCL and then move to Kenya EMR. Although there might be subsetting going on in OCL, there should not be a lot of temporary concept states that would confuse the ATOM feed.
I suggested that OCL should publish an atom feed of modified concepts per collection, and that's what we should subscribe to. My reasoning is that this is the web-standard technology suggested by Rest In Practice, and it's also what ThoughtWorks does as its standard.
It's possible to achieve the same result using custom parameters to REST calls; I was thinking it may be easier and supportive of standard tooling for the OCL side to do a feed.
Does it mean that when we get the atom feed with modified concepts, we need to do as many additional calls to REST as there are modified concepts to get their full representation?
I take atom feed won't be used for the initial import and we do much less REST calls to get 70K concepts.
We have kept this module on the OpenMRS Road Map since October 2014. I believe good progress has been made, but it has been stalled for most of 2015. Now that May is upon us, should we take this off our road map and revisit it if/when it becomes a priority? Or is there progress being made or milestones being achieve that I'm not aware of?
I think this should (still) be quite a high priority for the community, and being able to subscribe to OCL collections will have transformative and catalyzing effects.
Wearing my PIH hat, we're planning on revamping all of our concept management around this. The question from our end is, is there anything we can do to speed this up? (I think the answer is "no, we're just waiting for a bit more of OCL to be finished".)
So, Burke Mamlin, the stalling is really due to factors outside of OpenMRS's control, not because it's not a priority from our end...
12 Comments
Burke Mamlin
From our conversation, Darius Jazayeri and I estimate (from the "double it and double it again" method) this project will require two dedicated external developers + one dedicated OpenMRS developer (presumably, Rafał) two two-week sprints to get this done. This assume the OCL side (web service endpoints) is already done.
Burke Mamlin
Is this something that the Metadata Sharing Module could do? If not, why not? Would our effort be better spent improving the Metadata Sharing Module instead of creating another module to share metadata?
Burke Mamlin
(moving Rafał Korytkowski's reply into threaded position)
On 08 Aug, 2014, Rafał Korytkowski replied:
Burke Mamlin
FYI – found this related page.
Andrew Kanter
There are a lot of references above to concept names. I believe the requirement is that the same name cannot be preferred in the same locale. Just want to be clear that it should be possible to import duplicate concept names in different locales. (Or assume that you are including locale in the definition of duplicate name).
Burke Mamlin
(moving Rafal Korytkowski's reply into threaded position)
On 13 Aug, 2014, Rafal Korytkowski replied:
Andrew Kanter
The current intent is to have CIEL continue to manage concepts, then publish to OCL and then move to Kenya EMR. Although there might be subsetting going on in OCL, there should not be a lot of temporary concept states that would confuse the ATOM feed.
Darius Jazayeri
I suggested that OCL should publish an atom feed of modified concepts per collection, and that's what we should subscribe to. My reasoning is that this is the web-standard technology suggested by Rest In Practice, and it's also what ThoughtWorks does as its standard.
It's possible to achieve the same result using custom parameters to REST calls; I was thinking it may be easier and supportive of standard tooling for the OCL side to do a feed.
Rafal Korytkowski
Does it mean that when we get the atom feed with modified concepts, we need to do as many additional calls to REST as there are modified concepts to get their full representation?
I take atom feed won't be used for the initial import and we do much less REST calls to get 70K concepts.
I'm fine with that.
Burke Mamlin
We have kept this module on the OpenMRS Road Map since October 2014. I believe good progress has been made, but it has been stalled for most of 2015. Now that May is upon us, should we take this off our road map and revisit it if/when it becomes a priority? Or is there progress being made or milestones being achieve that I'm not aware of?
Cheers,
-Burke
Darius Jazayeri
I think this should (still) be quite a high priority for the community, and being able to subscribe to OCL collections will have transformative and catalyzing effects.
Wearing my PIH hat, we're planning on revamping all of our concept management around this. The question from our end is, is there anything we can do to speed this up? (I think the answer is "no, we're just waiting for a bit more of OCL to be finished".)
So, Burke Mamlin, the stalling is really due to factors outside of OpenMRS's control, not because it's not a priority from our end...
Andrew Kanter
A lot has been happening this year and I believe we are close to deploying. I would check with Rafal Korytkowski and user-4e734.