Wiki Spaces

Documentation
Projects
Resources

Get Help from Others

Q&A: Ask OpenMRS
Discussion: OpenMRS Talk
Real-Time: IRC Chat | Slack

Projects

Page tree
Skip to end of metadata
Go to start of metadata
web search flow diagram

Primary mentor

Jeremy Keiper

Backup mentor

Paul Biondich

Assigned to

Christopher Zakian

Background

Successful medical record systems consistently make it easy for health care workers to get the information they need for patient care in a rapid, lightweight way.  Most often, clinicians are blissfully unaware of the technology underneath the user interface; and this often hinders their ability to find the information they seek.  For example, while there are a lot of important reasons to store clinical observations as one question/answer pair per row, and demographics as attributes within specific demographic tables, clinicians often times simply want to select new patients or look for existing diagnoses for a given patient without having to know what part of the user interface what they're looking for.   With the advent of the web and searching paradigms, an opportunity exists to apply well understood paradigms for web searching, to medical record system searching.

Purpose

The ultimate purpose of this project is to create a new UI widget (and module) for OpenMRS that has the ability to return search results based on a text query and to make this widget available throughout the web application.  The initial steps towards this vision relate to creating the low level infrastructure to allow OpenMRS API services to register themselves as "searchable", and developing the basic API to perform searches across those services. This search bar would then be added initially to the web application header.

Domain Expert(s) / User(s)

Jeremy Keiper, Ben Wolfe, Darius Jazayeri, all implementers on the implementers mailing list :)

Required / Preferred Skills

  • Java required
  • Java Servlet / JSP development experience required
  • Experience with developing search technologies preferred

Design

The search box should be a simple input box. Apache Lucene integrated with Hibernate Search is used to perform full text indexing on all registered entities. 

The envisioned workflow:

  1. User types in "Horatio Hornblower"
  2. A query is created using Apache Lucene QueryParser to do a full text search on the database.
  3. Hibernate Search executes the query on the database.
  4. The results are returned and then interpreted to generate a results page. 
  5. The displayed jsp shows all results linking to the different locations in the webapp.

The results can either be shown in a new page or search+results can be ajax and in a popup under the search box.  (The latter being preferred, but could introduced later if time is short)

Objectives

  1. Detailed workflow and software design for and development of the universal search box
  2. Develop OpenMRS service registration functionality
  3. Develop search API
  4. Develop early UI widget
  5. Register three OpenMRS doman objects (and their corresponding services) and make them searchable

Extra Credit

  1. provide mechanism of returning search results based on web application context (ie, what is returned when looking inside a patient record differs from when outside a record)

Milestones

  • Midterm - finalized design of workflow and service, and basic UI in place (one object type).
  • Final - registration functionality, extended search API, and multiple registered objects.

Resources

10 Comments

  1. Is there any way to index a person attribute?

        a.  In the old way of doing person attributes, where the attribute type is a string, and the column to be indexed is person_attribute.value

        b.  In the old way of doing person attributes, where the attribute type is an openmrs type (e.g. location) and the column(s) to be indexed are indicated indicated in the hbm/dao?

        c.  In the new way of doing person attributes, where there is only a handler specified, but the handler can return a serialized object of the appropriate type?

    How does a module get its objects indexed?  How does a module add its indexes to an existing search set?  Does module loading/unloading require any reinitialization which would affect performance?

    How can the search accept strings of LocalizedString type?  Is there a way to limit the search by locale for concepts or localized strings?

  2.  a:

    Indexing is taken care of by lucene, (see the resources) which will put an entity (a class for our purposes) in a lucene document. This document also contains indexes of all related fields (or a proper subset if you so choose).

    Say that Person.java contains a set of PersonAddresses. under hibernate search, the Person document in lucene will contain a field called personAddress. This field contains all subfields of personAddress. So one can search on personAddress.address1 personAddress.address2 and so on. 

    b:

    The annotations/mapping files are either on the POJO or in a separate file. 

    c:

    Not sure what you mean. Could you clarify further?

    A module gets its objects indexed just like anything in the API would. If a class is marked to be indexed (with @Indexed at above the class) hibernate search will automatically index all transactions with the database using that entity. 

    If there is existing data in the database, it should be indexed with the mass indexer hibernate search provides, and all subsequent transactions will be updated as normal.

    I assume by localizedstring, you refer to something such as chinese or french as opposed to a data type. If this is the case, then yes. Otherwise, I would have to check it out, since I am not sure.

    1. Chris --

      Re my first question regarding person_attributes, please see Improved Person Attribute Types for a discussion of the new way of doing person attributes with some references to the old way.  Also look at the data model.  Looked at from the point of view of the person, the person has person_attributes, which have a person_attribute_type and a value.  Person_attribute_type has attributes format and foreign_key.  Looked at from the point of view of person_attribute, it is just a key-attribute-value table, with person_id the key, person_attribute_type the attribute and value the value.  Cases a and b use this data model. 

      Case a is the simplest where person_attribute_type.format=java.lang.String.  You could certainly index person_attribute.value; the problem is, that only gives you the answer, not the question; so you might have "yes" and "no" as answers but have no idea about the question, which might be Medicaid eligible? or Has caregiver?  What you would really like to index is the question concatenated with the answer.

      Case b is more complicated where you are dealing with an OpenMRS object.  Suppose it is a location; then the value of the attribute is the location_id of the selected location, when what you really want indexed is the name of the location.  Suppose it is a concept; in this case, the foreign_key will be the concept_id of the question and value will be the concept_id of the answer.  Again, it is the name of the concept we want indexed, not it's id.

      Case c is yet more complicated in that you can no longer use tables to find the value of interest; instead, you have to call the handler (which replaces format) and a handler_configuration (which replaces foreign_key) and you have to use a method of the handler to convert the value into the interesting text.

      Re my second question regarding modules, my worry is that modules can be started and stopped.  Presumably their objects won't change while they're not running, so they won't need indexing, but it is possible that openmrs objects they contain will change.  Also I'm concerned about startup and shutdown overhead.

      Re my third question, see Localization Tools.  Localizedstring is a type that contains text in multiple locale versions within a single text string.  There's a mixture of metadata and data in the string.  Perhaps concept is the easiest case of localization, because concept_name has a locale field and a text field, with a record for each locale (plus some gimcrackery to deal with synonyms and abbreviations).  At any rate, I'd like to hear a more detailed answer than "yes".

      My worry is that this feature will prove useless to the user because of lack of context for the indexing terms and because of a high proportion of irrelevant hits.  That's one reason I questioned the use of Lucene for this project.  It may be good where you have quantities of text to index, but in our case we have almost exclusively coded values.  A word cloud of an OpenMRS DB would include every word at the just about the same size, because our goal is to reduce our data to occurrences of defined concepts.

      1. Roger, I think you're reading too much into the reason for the universal search and using Lucene for fulltext indexing. We are enabling the search of domain objects via simplified queries. If a domain object has an attribute that can be represented or specified textually, we can index that attribute so the domain object can be found in connection with that attribute.

        We originally were thinking along the lines of Wolfram Alpha, where the universal search could generate statistics, charts and cohorts from queries. This is certainly possibly with Lucene, but the real target we want to achieve with this GSoC project is more Google-like: a simple list of prioritized results, hopefully giving a user a minimal, two-step process to access any piece of data he is looking for. To that end, we can focus only on text representations of data. The way I look for generally available information now is totally different than what I did before Google, and I can see the universal search box being equally innovative for use within our medical record system.

        Also, the speed of Lucene queries far exceeds the capabilities of our API searches. If the index were integrated into core, we could use it for API searches instead of relying on database indexes for speed, and we would not need as many indexes on the actual database. That situation would drastically decrease time spent on data insertion and the disk space used by MySQL for indexes (check your obs table's index).

        If you have a good enough reason for using universal search to find domain objects by their associated attributes, then we should incorporate those into fulltext search. This is a first attempt, and if we do it right (with logging of queries and accepted results) we can improve the index and ranking over time.

        If we are going to debate the usefulness of universal search box, let's do it on the mailing lists. I will be posting summaries of this discussion to both lists shortly so we can continue in that forum.

        1. Jeremy --

          Let's be clear -- I am not opposed to universal search.  I am pointing out some issues that might make a simplistic implementation of universal search not useful -- (1) KAV data structures; (2) highly normalized and coded data; (3) localization; (4) polymorphic persistence; (5) "boxing" of objects so their attributes are not transparently available; (6) modules.  At this point, I am thinking that the boxing problem caused by the new attribute_type paradigm is a serious one affecting many aspects of OpenMRS and that we should reconsider it.

          1. Roger, you make some decent points but remember this GSoC project is piloting a universal search feature.  It will be successful if it provides a Google-like interface to limited sets of data and brings to light the issues for more advanced searching.  One of the goals of attribute extensions (like person_attribute) is to have a displayable value for each attribute.  For universal searching, we may be limited to that value.  If the Lucene-based approach proves useful, then we'll surely want a datatype within Lucene that can handle our localized metadata.  We will also (eventually) need a mechanism for modules to add content to universal search & come up with a policy for what happens when the module is stopped/removed.  But it's unrealistic to think all of this will be solved within a couple months of a GSoC project, especially when the primary goal of GSoC is to get bright, young developers like Chris excited about open-source programming and OpenMRS development. :)

            1. Burke --

              So maybe we should put fulltext indexing to work where it can really show its stuff.  For example, case notes (or whatever we're calling the feature that is supposed to handle procedure narratives).  I'm pretty sure there's a corpus of them (or of emergency room notes) on-line that could be used for a demo.  Maybe add a GUI to generate advanced queries (x within 5 words of y) or use synonyms from the concept_map table.  Then maybe we can add some other cool features like speech-to-text.

              I'm sorry I reacted so strongly to seeing drugs and observations coming up on the same search, I'm afraid it's flavored people's perceptions of my comments.  I'm certainly a fan of finding more intuitive methods than Arden syntax.  Also, you know I like realistic use cases, and research or epi rank high in this area, but my perception is that most front-line docs are not into exploratory data analysis.

  3. For what it's worth, I thought I'd reflect on the origins of the term "universal search box" which go back to Marco's Pizza in Indianapolis this past winter, where I was talking with Jeremy about a pervasive search box that would be "universally" on every page and would accomplish the function of the various search boxes already in the default UI (i.e., patient search & concept search), primarily to address some of the findings of a fall 2010 semester usability test of OpenMRS 1.6 at IU – namely, that it would save time if there were always a patient search box available for data entry clerks. We used the "Search Confluence" box in the upper right of this page as a possible prototype – not all-powerful, but enough to get started and a gateway to a more flexible search interface.

    I realize that it obviously became wider scope than that, but perhaps "universal" wasn't the best term to use at the time – maybe "pervasive search" would have been better.

    Anyway, just wanted to brain dump this for posterity's sake.
    -Michael

  4. Whats the status of this module?

  5. This module has a working prototype where the API search works the
    same as the current API searches. The lucene search is working, but
    needs to be more intelligently indexed to increase the usefulness of
    the search results. Additionally, we need to rework the UI and the
    Admin page to make it more user friendly.

    I plan on working on it more over the summer. If anyone else wants to
    contribute too I would be happy to walk them through the code.