Using attention methods to predict judicial outcomes

🔬 Research Summary by Vithor Bertalan, a Computer Engineering PhD Student at Polytechnique Montréal.

[Original paper by Vithor Bertalan and Evandro Ruiz]

Overview: We have developed a model to classify judicial outcomes by analyzing textual features from the legal orders. After that step, we used the weights of one of our networks, a Hierarchical Attention Network, to detect the most important words used to absolve or convict defendants.

Introduction

What can we find in a text to convict or absolve defendants? Do words matter in a legal order?

Focusing on those main research questions, we have used AI classifiers to predict judicial outcomes from legal orders. For this purpose, we developed a text crawler to extract data from public electronic legal systems. These texts formed a dataset of second-degree murder and active corruption cases. We applied different classifiers to predict judicial outcomes by analyzing textual features from the dataset. Our research showed that Regression Trees, Gated Recurring Units (GRUs), and Hierarchical Attention Networks presented higher metrics for different subsets.

Finally, to accomplish our main goal, we explored the attention weights of one of the algorithms used, the Hierarchical Attention Networks, to find a sample of the most important words used to absolve or convict defendants. Therefore, we have found the words that matter most in legal orders for both outcomes.

Key Insights

How we got our documents

For the research, we collected a corpus of judicial outcomes from the eSAJ, the electronic system of the Sao Paulo Justice Court, Brazil. We selected a few previously defined judicial subjects to restrict the documents captured. We chose only judicial subjects with very well-defined outcomes. Namely, second-degree murder and active corruption. We have implemented a web text crawler to capture the data from eSAJ. Using the text crawler developed for this research, we have collected 2,467 cases, only selecting homicide and corruption subjects, resulting in 1,681 homicide cases and 786 corruption cases. The crawler was used to gather documents from different periods.

Using professional guidance

We have used the professional guidance of Brazilian lawyers with the purpose of better understanding the texts. The language adopted worldwide in the field of Law is notoriously obscure. Therefore, we decided that professional consulting was necessary to understand each of the judicial cases’ outcomes fully.

Transforming documents into numbers

After preprocessing the texts, we transformed our dataset into numerical vectors using Term Frequency–Inverse Document Frequency (TFIDF) and Word Embeddings to see the performance of the two methods.

Applying Artificial Intelligence methods

We used several different methods to predict the outcomes: namely, Logistic Regression, Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Regression Trees, Naïve Bayes, Support Vector Machines (SVM), Multilayer Perceptrons (MLP), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTM), Gated Recurring Units (GRU) and Hierarchical Attention Networks (HAN).

The results were split into non-neural network methods and neural network methods. For the homicides dataset, the non-neural network method which showed the best performance was the SVM method. Regression Trees were the best method for the corruption dataset among the non-neural methods. For both datasets, GRUs showed the best performance among the neural methods.

Analyzing the words and sentences

After the classification, we sought to order all the words in each dataset by their attention weights. Therefore, each word will have a unique value, ranging from 0 (where the word would have no importance in the classification of the document) to 1 (where the word would have maximum importance in the classification of the document).

It is helpful to mention that a word might have different attention weights in distinct sentences. As a short example, the sentence “The defendant robbed a bank” and the sentence “The defendant did not participate in the robbery because it was going to a blood bank” both have the word bank but in very different contexts. In the first sentence, the word would be a vital contributor to the condemnation, while it would contribute to the absolution in the second sentence. Therefore, words with different attention weights appeared more than once in our final calculations.

Conclusion: Words do matter

A few examples of our findings are, for instance, in the corruption dataset, verbs that indicate the giving of goods (like “ofereceu,” offered; “apresentou,” presented; “oferecendo,” offering) are signs of condemnation.

Words that show physical damage are preponderant for homicide condemnations: “infração” (infraction); “disparos” (gunshots); “golpes” (physical blows); “socos” (physical punches); “lesões” (lesions). On the other hand, the word “júri” (jury) is a fundamental word to absolve homicide defendants.

Some curious words are also present, such as “infância” (childhood) and “social” (social) as a word present in the homicide absolution list, indicating that some defendants can be absolved of homicides by appealing to social and emotional topics.

Between the lines

The first key finding – Machine learning can effectively process law texts

We demonstrated that algorithms could predict the outcome of judicial cases, given the text written on their court decisions. We had results that exceeded 95% accuracy for most cases.

The second key finding – Regression Trees are good methods for Law texts

As other authors have also found, our research shows that Regression Trees are good and computationally effective methods to process law texts. The reason is yet to be found, but some authors suggest that Regression Trees can study legal conceptions of Law, revealing patterns those other methods cannot emulate as effectively.

The third key finding – Words are primary to absolve or convict defendants

As our main research hypothesis predicted, words are significant in absolving or convicting defendants. Since our method is language-agnostic, it could be fine-tuned to other languages to help legal workers understand the most impacting words to achieve the desired outcome, or to help humanities researchers perform textual analysis to find the underlying characteristics of each situation.