Deep Learning for Legal and Medical Text Analytics

Loading...
Thumbnail Image
Date
Authors
Lam, Jason
Keyword
NLP , Deep Learning , Legal Analytics , Machine Learning
Abstract
Applications of deep learning using text have shown wide success in detecting human emotions, analyzing social media feeds, classifying text and creating compressed representations. While deep learning has been widely applied across numerous domains, researchers have only begun to apply these techniques to the field of law. Recently, researchers have shown success in utilizing attention-based models for predicting criminal charges using unstructured text as an input. The legal field poses many challenges such as having a small amount of legal data available in Canada, the verbosity of judgements, the legal-jargon used in judgements, and the subjectivity of outcomes. In the medical domain, we have seen breakthroughs using machine learning to predict diseases using physicians' notes and classifying patients using ICD-9 codes. Unfortunately, many of the state-of-the-art systems required expensive hand-annotated labels that are often unobtainable. In this work we investigated the prediction of reasonable notice and the identification of similar cases for personal injury in the field of law, and the prediction of PTSD in the medical domain. High costs in litigation currently create a barrier for the majority of Canadians. We believe with a deep learning approach for the prediction of reasonable notice, we can help people weigh the costs of their legal options with the potential outcomes. Similarly, in personal injury, identifying previous judgements with similar plaintiffs would help legal practitioners and laypeople anchor their expectations of damages. Lastly, intelligent systems in the field of medicine can help medical professionals provide a better patient care through the prediction of future complications. For our legal research we utilized numerous attention-based models and pretrained models with legal judgement summaries typed by humans as input. The out-of-the-box pretrained RoBERTa model performed the best with a 69% accuracy. We believe our work approaches the upper-echelon of performance given the subjective nature of judges and the variability in outcomes. RoBERTa, modified with two classification layers, yielded the best qualitative results for identifying similar cases in personal injury. Lastly, the need for manual validation in our medical research led us to demonstrate a proof-of-concept using a weakly-supervised approach.
External DOI