Skip to main content

Brainspace

FAQs: CMML or Predictive Coding Workflow?

Note

Due to prior partner commitments and industry changes, as of Release 6.7 clients will no longer be able to create new Predictive Coding sessions. Existing Predictive coding sessions are not disabled, they can continue to be used in the near term. It is highly recommended that clients use CMML with ACS as a replacement for PC.

Debates over which supervised learning workflow is "best" are common, particularly in the e-discovery community, with enthusiasts using ever higher numbers (TAR 2.0, TAR 3.0, Predictive Coding 4.0,...) in an attempt to convey progress.  The truth is that different workflows are appropriate for different needs.

The following are are some frequently asked questions when choosing between the CMML and PC workflows in Brainspace:

Q: Will predictive coding be used to cull a dataset prior to a later review phase, or is the goal to find the documents of interest during iterative training and review?

A: Either type of workflow may be used for culling. CMML is ideal for workflows where the goal is to find documents of interest as quickly as possible.  

Q: Are statistical estimates of the recall, precision, and other measures of effectiveness of the predictive model needed as part of the reporting on the predictive coding process?

A: If so, a PC workflow is more appropriate.

Q: Is a statistical estimate needed of how many documents of interest were not found during training?

A: If so, a CMML workflow may be more appropriate.

Q: Is it desired to use all review documents for training?

A: A CMML workflow does not distinguish between review and training, and so is most natural when this is desired.

Q: Is the total number of target documents, and/or the proportion of target documents, very low?

A: If so, using a Control Set may be impractical. While a dummy Control Set may be used with a PC workflow, a CMML workflow is often preferable in this case.

Q: Is an iterative relevance feedback (training on top-ranked documents) approach to training desired?

A: This can be done in either workflow, but is easier in a CMML workflow.

Q: Is it desired to simultaneously train predictive models for several different topics?

A: This is only supported for CMML workflows.

Q: Is the topic of interest expected to evolve as documents are examined during training?

A: A CMML workflow makes it easier to change the coding of documents, remove documents from training, and flexibly pursue an evolving information need.

Q: Is it desired to a use a predictive model in combination with other analytics tools (for instance, to view top score documents on the cluster wheel)?

A: This is much simpler in a CMML workflow.