Logging Errors to the Database

Primary mentor

N/A

Backup mentor

N/A

Assigned to

N/A

Abstract

Log uncaught exceptions to a database table, along with associated metadata (date/time, user, page and/or service, stack trace summary, unique identifier) and provide this information to users in the default error handler so that they can provide a specific error log id along with their bug report. Provide tools to mine this database table to find patterns in error messages, users who frequently hit problems, and to assist in proactively improving the system regardless of whether or not users are reporting their errors.

Project Champions

mseaton

Summary

We implemented this in our legacy system in Haiti, and it essentially involved the following:

Creation of database tables to hold the exception details

The exception_log table lists summary information and metadata about the overall exception:

exception_log (
  exception_log_id,                 -- Primary key of the table
  exception_class,    -- The specific type of exception that was thrown (eg. NullPointerException)
  message,            -- The exception message ( eg. throwable.getMessage() )
  exception_datetime, -- Timestamp for the exception
  user_id             -- References user to store which user experienced the error
)

The exception_log_detail table lists the specific stack trace information that is relevant. This might be restricted on only those parts of a stack trace that provide useful, actionable information (for example, only those classes that match a certain pattern, or only those lines of a stack trace that contain certain information):

exception_log_detail (
  exception_log_detail_id,  -- Primary key of the table
  exception_log_id,         -- References the exception_log table above
  file_name,                -- The filename extracted from the stack trace
  class_name,               -- The class name extracted from the stack trace
  method_name,              -- The method name extracted from the stack trace
  line_number               -- The line number extracted from the stack trace
)

The exception_root_cause table is used to store details about the root cause if it exists, this is important for situations where the original thrown exception is wrapped into another exception type and the actual root cause gets masked in the process, therefore it is important to capture these details too.
The use case for the root cause is that if the message for the thrown exception is blank, we can use that of the root cause and typically the root cause should be included when reporting errors or submitting tickets to jira from the module.

exception_root_cause (
  exception_root_cause_id,    -- Primary key of the table
  exception_log_id,           -- Foreign Key to the actual exception that was thrown
  exception_class,            -- The specific type of exception that was thrown (eg. NullPointerException)
  message,                    -- The exception message ( eg. throwable.getMessage() )
)

The exception_root_cause_detail table hold extra information about the root cause

exception_root_cause _detail (
  exception_root_cause _detail_id,  -- Primary key of the table
  exception_root_cause_id,          -- References the exception_root_cause table above
  file_name,                        -- The filename extracted from the stack trace
  class_name,                       -- The class name extracted from the stack trace
  method_name,                      -- The method name extracted from the stack trace
  line_number                       -- The line number extracted from the stack trace
)
Utilization of these tables when an exception is thrown

In our default error page, prior to displaying any uncaught exception, we first record the exception details to these tables. Then, as a part of the default error page that is shown, we provide a reference to the exception_log id that is recorded, as well as a "More details" link which provides a summary of the exception based on what was recorded in the tables above. This often provides very useful in tracking down errors both in development mode and in production, long after end-users have overcome their initial error.

Reporting on these tables to proactively improve the system

Once this system is in place, phase 2 would be to build some basic query tools and summary reports that allow for proactive identification and resolution of the exceptions that users are experiencing. This will most likely involve:

  1. An interface that allows for the exception log to be queried by user, datetime, exception class, exception frequency, etc.
  2. An ability to view the specifics of a particular exception and to link it to a ticket and/or mark it as resolved when the underlying code problem has been fixed
  3. An ability to delete all exception logs either by date range or by exception class
  4. A handful of summary reports which are emailed to a configurable list of addresses on a scheduled basis that report on:
    1. Most common errors reported which are not yet resolved
    2. All errors reported during the last period (eg. last day, last week, etc depending on frequency of the report)

Design ideas

Logging the Root cause of an exception

It is always quite useful to know the root cause of the exception, so listing it out may be quite useful for debugging purposes.

Since we already use Apache Commons Lang, we can do this quite easily using the ExcptionUtil class (Not to be confused with the OpenMRS ExceptionUtils class. I would recommend using this utility class to not only log the root cause, but many other fields associated with the root cause.

Important methods in the ExceptionUtil class are,

static java.lang.Throwable

getRootCause(java.lang.Throwable throwable)
Introspects the Throwable to obtain the root cause.

static java.lang.String

getRootCauseMessage(java.lang.Throwable th)
Gets a short message summarising the root cause exception.

This util class contains a lot of helpful methods which deserve further exploration. We may even introduce a new table to capture a very compact stack trace based on these data.

Different logging details/levels for different Exception classes

What are our plans for allowing the system to be expandable to support the potential need to log different exception classes differently ?

For example, assume that we get an AuthenticationException. This is a very sensitive exception which is usually caused by a registered user mistyping his usename or password. But what if we want to log this more stringently to check if this was not a hacking attempt ?
In such an event we would need to say,

if(instance of AuthenticationException){
   LogToDataBaseDifferently(); //Code behaves differently
   generateAlerttoAdmin();
} else{
   LogToDatabaseNormally();    //Code behaves normally for all other exceptions
}

Ultimately, we (or our implementors) may want to introduce their own custom database logging fields.

To me (surangak), the ideal way to support this would be via Spring AOP. As an additional benefit, Spring AOP will also allow separation of concerns, and wont clutter our normal code with logging details.