No meetings are currently scheduled.
Electronic health records (EHR) data is captured across many healthcare institutions resulting in large amounts of diverse information that can be analyzed for diagnosis, prognosis, treatment and prevention of disease.
One type of data captured by EHRs are clinical notes which are unstructured data written in natural language. We can leverage Natural Language Processing (NLP) to build machine learning (ML) models to gain understanding from clinical notes that will enable us to predict clinical outcomes.
Unplanned patient readmission is a significant contributor to unfavorable patient clinical outcomes and high healthcare costs.
ClinicalBERT is one ML model that is able predict 30-day hospital readmission from clinical notes. Although the performance is good, it suffers from a limitation on the size of the text sequence that is fed as input to the model. Models using longer sequences have been shown to perform better on different ML tasks even with clinical text. In this project the we’ll evaluate the performance a novel ML model called Longformer which is able to learn from longer sequences than previous models. Performance is evaluated against baselines and previous state-of-the-art models in terms of Area under the precision-recall curve (AUC-PR).
Our implementation of ClinicalBERT is able to exceed the baselines of Long short-term memory (LSTM) and Deep Averaging Network (DAN). Further work is needed for Longormer.