Skip to main content


Supervised Machine Learning Overview

Supervised Machine Learning refers to machine learning algorithms that let a user teach software by example. In e-discovery, document review workflows based on supervised learning are often referred to as predictive coding, and are one form of technology-assisted review or TAR. Supervised learning has applications far beyond e-discovery, however. Investigations, threat analysis, intelligence, and any other task where particular types of documents are sought.

Brainspace 6 supports two types of Supervised Machine Learning workflows, both of which can be customized to fit a range of applications. Both workflow approaches benefit from Brainspace's extensive linguistic processing of text, and the ability to use a range of analytics tools in selecting training data. Both can use any existing set of coded documents as training data.

Predictive Coding Overview

The Predictive Coding workflow approach supports classic (sometimes called TAR 1.0) e-discovery workflows where Brainspace is used in conjunction with a review platform (kCura's Relativity®). The user begins the workflow by manually coding (in Relativity®) a random sample (the Control Set) from the dataset. The user then trains a predictive model to distinguish between two types of documents (e.g. Responsive vs. Not Responsive). Training proceeds iteratively, with batches of documents selected manually or automatically by Brainspace, and coded by the user in Relativity®. The Control Set is used to track effectiveness statistics (recall, precision,...) during training, estimate the proportion of responsive documents in the collection, and aid the user in setting a cutoff score for assigning codes. The trained predictive model is used to assign scores and automatically code all documents in the predictive coding collection.

Continuous Multimodal Learning (CMML) Overview

The second supervised learning workflow approach is our Continuous Multimodal Learning (CMML) workflow. The CMML workflow can be carried out entirely in Brainspace, and integrates supervised learning with Brainspace's tagging system. Predictive models may simultaneously be trained for as many binary classifications as desired. Training can be done using batches as in Predictive Coding, or in a more flexible fashion, by tagging documents anywhere they are viewed. Predictive models may be used to rank documents within Brainspace, and top-ranked documents may be selected for training. All the training data selection methods provided in the Predictive Coding workflow may also be used. Predictive scores may be exported to review platforms. If desired, a random sample may be drawn from unreviewed documents after a CMML review in order to estimate the fraction of target documents in the unreviewed material. The CMML approach can support workflows referred to as CAL (TM), TAR 2.0, and other e-discovery buzzwords, but goes beyond them in ease of use, effectiveness, and the ability to leverage Brainspace's wide range of analytics to find training data.

So which workflow is best suited for your particular needs? View Choosing the Predictive Coding or CMML Workflow section for a list of deciding factors.