Child pages
  • Centralized Auditing (Design Page)
Skip to end of metadata
Go to start of metadata

Background

Most OpenMRS tables include auditing attributes:

  • changed_by
  • date_changed
  • voided_by
  • date_voided
  • void_reason

These have limitations and require that tables contain audit history. We also end up supporting similar audit capabilities in several places within the API. We would like to centralize these auditing functions within a central audit service.

The "creator" and "date_created" columns can be left on the original tables.  Moving them to a centralized table can be discussed at a later point.

The audit information is currently placed on objects as they are saved automatically by the RequiredDataAdvice/SaveHandlers and/or the AuditableInterceptor hibernate class.

Design

A single 'audit' table will hold the values above along with a "blob" type of column and a "type" column.  The blob is serialized xml.  The type is the class that is being modified.

We should start with one object to try this on before implementing system wide.  It has been suggested that something like person_address would be suitable.

Design Goals

  1. Remove the redundant columns in almost every table for creator/dateCreated etc.
  2. Keep a record of what actually changed from one version to the next (e.g., an xml blob).
  3. Keep all the history for an object instead of just the most recent change.
  4. Avoid tight binding to specific domain objects; more specifically, have a strategy for easily handling additional types in the future.

Road-map

1. Create a table (audit_log), corresponding class (AuditLog) and hibernate mapping.

2. Replace AuditableInterceptor to handle updates and deletes. On an update/delete, the AuditableInterceptor will make an appropriate entry in audit_log.

3. The audit_log table has a blob field which stores a XML file which contains the previous and current state of the modified object.

4. Create an AuditService, which will allow the administrator to view entries in the audit_log.

Assigned Developer

Ankur Gupta, Niranjan Kulkarni, Rohit Manohar,

Interested Parties and Mentors

Ben Wolfe, Jeremy KeiperBurke Mamlin

Further Discussion

On May 30, 2013, during Q&A with Wyclif, we reviewed the functionality of his auditlog module and the possibility of using it instead of our current method of maintaining audit log information in each record.  We identified the discussion points below.  Because we expect tomorrow's meeting will be preoccupied with GSOC, we welcome discussion here.  The main goal for tomorrow's meeting will be to identify potential show-stoppers, to address comments and questions raised during discussion, and to decide whether it would be a good investment of resources to run a performance test of the new proposal.

1.  What is the purpose of auditing?  What do we expect to gain by changing?

Auditlog records who changed data by maintaining a table of date-time, user, table, field, old value, new value.  It needs some modification to include newly added and deleted records.  Field values are serialized.  It might be possible to record only old value or only new value with same effect.

a. Improve performance by eliminating 3 indexes per table

b. Be able to find out who made a particular change

c. Meet FDA requirements for clinical trial data??

d. Meet HIPAA requirements for auditing access??

e. Not replace native DB transaction logs or backup procedures

2.  What is  the architecture of auditlog and how would it integrate with existing code/modules?

It is a Hibernate interceptor, as are (1) core audit logging; (2) synch module; (3) event module.  Since many important events are DB updates, there is a lot of common functionality that could be factored out and help us move toward an event-oriented model.  Some possibilities:

a. hibernate interceptor generates events and event module publishes them to which centralized auditing and synch subscribe;

b. hibernate interceptor includes centralized auditing which raises events to event module for others to subscribe.

For performance reasons, we should consider giving auditlog its own connection to the DB (not through Hibernate?), which may be a different DB than the OpenMRS DB.

3. Are there DB changes which would not be logged?

Anything that did not go through Hibernate.  Anything that is not an OpenMRSObject.

a. Core SQL method -- uses Hibernate, but do they trigger the interceptor?

b. Liquibase -- can it be configured to use Hibernate? what about structure changes? do we need to keep table version?

c. Direct manipulation through MySQL or other DB session -- at startup, compare each record update time with last logged change??

d. Initial startup -- it would take days to copy the initial contents of the DB, probably a show-stopper

4. How could we test centralized auditing?

Install an MDS package under both scenarios and compare times

a. Modify auditlog to reflect discussions.

b. Write liquibase change sets to remove indexes and not-nulls from audit fields (big task, could be helped by automation)

c. Modify Hibernate XML to remove relationships from audit fields

d. Disable the hibernate interceptor that updates audit fields

e. Locate uses of audit fields -- duplicate checks? voiding an encounter? -- and make sure they won't break if fields are empty

Estimated time: 1 week

 

16 Comments

  1. Do we not want to also track the cause or event that created an object? I need some way to tie HL7s to encounters, relationships, obs, etc.

    1. The design of central auditing – at least at this point – is meant to remove voided/changed data from core tables and to centrally track those changes.  It's not meant to track the creation of any new data in the system.

      What you're looking for would probably be better accomplished through logging in the HL7 processor.

  2. If we're going this far, I don't see why we wouldn't put creation as an audit event. Then we wouldn't need to store old and new, we'd just store each new.

    I'd like to see the updates go into a queue where they'd be written on a lower-priority thread. It might be worth writing them to a different db or to a flat file so they can be used to recreate a trashed db.

    As I see it, the fields in the table should be user, timestamp, table, ID, UUID, change type, blob. I don't think we need to maintain any indexes except for uniqueness sake.

    We probably should put something in the save routines to delay updates if necessary so that table/timestamp or table/timestamp/ID is unique. We could walk the queue backward while the timestamp is equal looking for equal table or table/ID.

    Burke, I don't know if you were hoping to recover space by deleting voided records, you usually have to do a reorg or dump/restore to do that.

  3. Sorry for my late comment, it would be nice to draw a line between service and DAO level audit logs, this is because service level audits in most cases are made my users and DAO level edits could include calls made by internal logic in the API. This can be implemented by adding a discrimator column to audit log table for the audit type i.e service vs dao layer.

    Wyclif

    1. Wyclif –
      I thought the idea was that logging would take place in the AuditableInterceptor which I don't think knows about the distinction you want to make. Actually, I don't understand the distinction you want to make. Since lots of changes can happen before a save, they could emanate from various places. If there are particular actions you want to know about, it sounds like a place for regular logging.
      Saludos, Roger

      1. Roger, the reason for what am saying is because the API makes changes to objects under the hood which are not user triggered and yet admins are normally interested in seeing user actions, the way you implement what i'm saying is possible via a combination of the auditable interceptor(to persist the changes) and AOP(to track changes in service layer Vs DAO layer).

        1. Wyclif –
          Not that the admin is not interested in who stole the cookies from the cookie jar (US summer camp cultural reference), just that I didn't think that was the purpose of this change. I thought that the purpose of this was to replace the auditinfo mechanism with something that performed better.
          As you probably know, there has been discussion of logging page hits to identify users accessing data which they shouldn't. Burke's point on that was that he didn't want us to reinvent logging for this purpose, and I agree with him. I think the same is applicable to your use case.
          Saludos, Roger

          1. Roger is correct. The point of this project is to record all changes to data. It doesn't matter where they originate from.

  4. I think we ought to drop voided/retired, voided/retired by, and voided/retired reason from the record.  We should leave voided/retired date as a key (along with table and id) to access the voided/retired reason field.  A null value would indicate not voided/retired.

    With regard to automating the development of liquibase scripts, I offer the following MySQL query:
    SELECT * FROM information_schema.table_constraints WHERE table_schema='<DBName>' AND constraint_type='FOREIGN KEY' AND (constraint_name like '%create%' OR constraint_name like '%change%' OR constraint_name like '%retire%' OR constraint_name like '%void%') ORDER BY table_name;

    We ought to keep track of system startups and shutdowns, that would help the user tie particular ranges of transactions to a particular backup version.

     

  5. I have one problem with dropping auditable fields, the issue is that the auditlog table is very likely to be a massive one, most likely larger than even Obs table, my initial assumption was that auditlogs need to be periodically archived preferably to the file system but this will pose a challenge when fetching the fields when the when some code needs to inspected them

    1. My feeling is that the audit information (with the exception of voided/retired) is rarely if ever used in a situation where speed is important.  We may not even need to maintain indexes, we could go into a maintenance mode when it is necessary to look at the log and do a Lucene index or reindex only then.

  6. Re logging methods and assuring that logs complete before transactions, see http://www.postgresql.org/docs/8.0/static/wal.html and the link to chapter 22.3 on that page.  Also helpful is http://www.pythian.com/blog/overview-of-transaction-logging-in-mysql/

  7. I guess it should be a reason to ignore the obvious auditing tool but I don't see any neither in description nor in discussion. And this makes me confused. I think the reason why hibernate envers is not discussed should be explained.

    I am new to openMRS. Sorry for naive question.

  8. Alexei i wrote an auditlog module a while ago and never really rolled it out, been caught up with other stuff in the past several months, i did some research on hibernate envers a while ago and it seemed like something we might want look at, i just need to dig a little more if we can get the it working by adding their annotations to some super classes in core for it to work for all their super classes. But from previous discussions this goes a little deeper than just keeping an audit trail, it involves getting rid of some columns in the DB and centralizing all audit related information.

  9. Wyclif, thanks for the explanation

    I see it is not being discussed for a long time. Has this activity been stopped or discussed in some where else?

     

  10. This has been on hold for a while