Title:
|
EXTRACT CLINICAL MEASUREMENT VALUES USING A REGULAR EXPRESSION PATTERN DISCOVERY ALGORITHM VS SUPPORT VECTOR MACHINE |
Author(s):
|
Douglas Redd, Bryan Gibson, Maureen A. Murtaugh, Joseph Goulet and Qing Zeng-Treitler |
ISBN:
|
978-989-8533-77-7 |
Editors:
|
Mário Macedo and Piet Kommers |
Year:
|
2018 |
Edition:
|
Single |
Keywords:
|
Medical Informatics, Clinical Informatics, Natural Language Processing, Machine Learning, Regular Expressions |
Type:
|
Full Paper |
First Page:
|
29 |
Last Page:
|
36 |
Language:
|
English |
Cover:
|
|
Full Contents:
|
click to dowload
|
Paper Abstract:
|
Background: Clinical measurements are commonly embedded in free-text clinical notes. These can be extracted using natural language processing, but this can be resource intensive with limited generalizability. We demonstrate a new approach using regular expression discovery for extraction (REDEx), a supervised machine learning algorithm that we have developed that automatically generates regular expressions to extract measurements with reduced effort. Results: We compare this approach to that of a support vector machine (SVM) in the task of body weight extraction. 968 weight values were annotated in 300 clinical notes and used for training of the REDEx and SVM models. 98 regular expressions were automatically generated by REDEx. In 10-fold cross validation the REDEx model consistently outperformed the SVM model, with precision .99 vs .85, recall .98 vs. .87, f1-score .99 vs .86, and accuracy .98 vs. .82. |
|
|
|
|