By: Chloé Pou-Prom
CHARTextract is a rule-based information extraction tool that allows for quick and easy refinement of keyword-matching based rules. By combining physician knowledge and keyword-based pattern matching, CHARTextract allows us to extract patient attributes from unstructured free-text notes. We are releasing the CHARTextract tool and 20+ rulesets (the list keeps growing!) that you can use and modify to your heart’s desire. We are also open-sourcing our codebase, so you can download it and customize it to fit your specific use cases.
Clinical Notes and Chart Abstraction
Clinical text data (such as discharge summaries, nurse notes, consult notes, and radiology reports) are rich in information. Being able to efficiently extract this information can lead to a better understanding of a patient population. For example, a clinic may be interested in extracting the following variables from their consult notes:
- What treatments are their patient population on?
- What are the success/failure rates of those treatments?
In the example dictated consult note below, we can see that the patient is currently being treated with copaxone and that they are not exhibiting any reactions to the treatment. Reading through each consult note and extracting these two variables (i.e., treatment and presence of adverse reactions to treatment) can be time-consuming.
Example consult note graciously generated by Josh Murray
If a patient is reacting well to a drug, a physician may dictate something like “…is tolerating the treatment …”. In such cases, employing a keyword search will be useful. Of course a sentence that contains the words “…is tolerating the treatment quite well…” bears more weight than something like “… seems to be tolerating the treatment…” or “… is probably tolerating the treatment…”.
Hmm… Interesting. There are patterns in the notes, and some patterns are more important than others. Let’s leverage this!
CHARTextract: Automating chart abstraction by leveraging word patterns and clinical experts
Relying on the intuition that there exist patterns in clinical notes and that some patterns are more useful than others, we developed CHARTextract – a tool that allows us to write “rules” for chart abstraction. The tool’s backend relies on a weighted keyword search. More specifically, CHARTextract combines:
- regular expression matches (think of this as a flexible or smart keyword search), and
- weighting (this is where the intuition that some sentences are more important than others comes in).
CHARTextract helps clinicians and data scientists work together to write “rules” for information extraction. Here at LKS-CHART, we are fortunate to work with a great number of clinical collaborators. We make use of their expertise by incorporating them into our rule-building workflow as much as possible.
We start off with a small sample of labeled data (e.g., consult notes in which a chart abstractor has identified the treatment and the success/failure of the treatment). From this data, we can then build our rules. This is where the clinical knowledge of our collaborators comes into play.
After starting off with an initial set of rules, the tool displays mismatches (i.e., any discrepancy between the label of the data point and the prediction of the tool) and shows why , there is a mismatch.
The interface then lets you toggle back and forth between viewing the mismatched instances and viewing the rules. We can then modify the rules on-the-fly.
We have released the CHARTextract tool and existing rulesets for variable extraction.
We welcome contributions to CHARTextract. There are many ways to contribute to CHARTextract:
 Physicians like to trust the model/software/tool you’re throwing at them.
 We strongly recommend getting started with this tutorial to familiarize yourself with the tool