AfricanLII Launches Citator Beta, Along with a New Automated Case Summarizer

In March 2019, AfricanLII announced that we would soon be launching a beta version of our new Citator application that has been in development since last year. Today, we officially launch said beta version of the Citator.

In our last post, we also discussed the citator in some detail. However, we did not discuss a different component of this application: our first automated case law summarizer.

The Importance of Case Summaries:

Every day hundreds of new judgments are written and published across the African continent. In the jurisdictions with Legal Information Institutes that are partnered with AfricanLII, judicial decisions constitute an authoritative source of law. For years, AfricanLII and its partner legal information Institutes have worked to make these decisions available to the public free of charge.

However, this access is often not enough. In order for access to judicial law to be effective, it is important that these documents are also easily searchable, and displayed in a way that facilitates quickly finding and assessing the right judgments for your matter.

The sheer number of judgments that constitute a jurisdiction’s body of law means that results need to be quickly narrowed down to only the decisions that are most relevant to you. As far back as 1927, Justice Sir J W Wessels (Wessels J “Codification of law in South Africa” TUC Bulletin No Jus 1 (1927) 1), wrote in a somewhat humorous appeal for the codification of South African law, that,

“It would be a task for anyone to read a tenth of them [cases], and still live”.

This was in 1927...

The average case in our records is 3119 words long. Given a reading speed of 200 words per minute, this amounts to 15 minutes reading time per case. At this rate it would take many years to read even just the cases that we currently have indexed. And our collection is growing at an ever increasing rate.

Summaries provide a way to tackle this problem. With a summary, you can get a quick overview of a decision before you read it. In this way, summaries drastically increase the speed by which you can read through search results and eliminate cases that are not relevant to your matter. When displayed along with search results, they furthermore allow you to do this without having to open up any judgments until after you have decided that it has a promising summary.

The Importance of Summarizing Every document

There is clearly a limit to how many summaries any publisher can write “manually”. What this implies is that any publisher that relies entirely on manual summarization will have to “elect” which cases are worth summarising, and which ones aren’t. Undoubtedly, these editorial decisions can have an impact on which judgments end up having a greater influence on the further development of the law. As such, we believe that the impact of these decisions need to be minimised as much as is practically feasible.  Furthermore, as digital publishers, we have enough computing power and space to publish all judgments, unrestricted by the space constraints of the paper format. This is why we have always published every judgment that is uploaded onto our website, regardless of whether or not we think that it is important, and also why we cite documents using the Medium Neutral Citation.

With this new project we will, for the first time, be able to provide a summary for every decision that we publish, whether they are classified as “reported” or not. We will continue to produce manual summaries for cases that we know are important, in the sense that it qualifies under the criteria for reportability,  and need higher-quality overviews. But automated summarization now means that no case will fall through the cracks because it doesn’t have a summary at all.

How to Read and Use our Automated Summaries

Automatic summarization of documents is an area of artificial intelligence that has received a lot of research, but is still regarded as being an unsolved problem theoretically and challenging in practice. There are furthermore many differences between legal documents and documents from other domains, for example, news articles or social media text. For instance, in a typical news article, the most important information usually occurs near the start of the document. With legal documents, this is not necessarily the case. We therefore developed a bespoke algorithm, which is custom-tuned to function better with case law, focusing on the logical structure of judgments.

We also chose to use so-called “extractive” techniques over “generative” techniques. In other words, our summarizer does not attempt to paraphrase a case, or to create flowing paragraphs, but rather extracts key phrases and key sentences as-is from the documents. We chose to use extractive techniques because generative algorithms are not yet mature enough to provide the sort of accuracy that is required in law. A small change in phrasing can often make a big difference to the meaning of a legal sentence. We prefer to leave any inferences up to the reader.

Our summarizer has been built up out of two components: a key-phrases extractor and a key sentences extractor.

Key-phrases are phrases that occur often in a document. They are similar to key words, but can also include phrases consisting of two or more words. They are phrases that have appeared often within the document. These are similar to flynotes provided by some publishers.

Key sentences are sentences that capture the main topics of the judgment. The way we choose these sentences is by looking for sentences that have similar semantic content to the document as a whole. We use a machine learning algorithm that is able to learn the semantics of strings of text within the context of the cases that we have in our collection. We have also written our algorithm to provide a good “spread” of sentences, so that we cover as many of the topics in the document as possible. Longer cases will have more key sentences and shorter cases will have fewer key sentences.

Our Citator Application has indexed over 200 000 documents, producing more than 300 000 citations. Of these, more than 10 000 have been linked to cases in our database. We have "written" 168 000 summaries. We expect these numbers to grow significantly over time as we further improve the accuracy of our algorithms, and as more case law is uploaded onto our platform by our partner Legal Information Institutes.

Please Leave Feedback!

The beta launch of our Citator is intended to provide you the opportunity to give us feedback on the ongoing development of the application. Many aspects of the application still require significant improvement. However, we would like to obtain feedback from our users as early as possible, to ensure that we meet your needs in how we build our product.

Once you have visited the website, please click on the “feedback” link in the navigation bar, and let us know what you think. Your feedback will be invaluable to ensuring that the Citator becomes the most effective and easy-to-use tool for finding case-relationships in African jurisprudence.

Watch the short introductory video before you explore the Citator further.