Supervised Learning - CMML Workflow
Before creating a CMML classifier, you must first create a mutually exclusive tag in Reveal with 2 choices. Typically, the choices would be named Positive/Negative or Responsive/Non-Responsive. You can add additional choices to the tag for use in Reveal, for example “Further Review Required” or “Tech Issue”, but these will not be used in the CMML session.
Note
Starting with Brainspace 6.7, it is possible to add multiple choices, up to 5 positive, 5 negative and 5 neutral.
Each positive choice sends the same Yes classification, each negative choice sends the same No, and each neutral choice shows the document has been seen but not classified.
There is no need to put the tag into a tag profile in Reveal, that will be done automatically once the CMML session has been created. Follow the instructions earlier in this document for creating a connected tag.
Once the connected tag is created click Supervised Learning -> New Classifier -> CMML.
Give the Classifier a descriptive name. Under Assign Tag select the Reveal connected tag choice to use as the positive and negative tag for Brainspace. In most cases, you won’t pre-review documents in Reveal using the same connected tag before creating the CMML session, but if documents are reviewed ahead of time, then those documents will be used as initial seed documents.
If you wish to export scores back to Reveal automatically after each round, enable option Export scores after each training round completes.
CMML session can be run in manual or auto mode. In manual mode, you create a training round, tag the documents in Reveal, pull the tags from Reveal into Brainspace, and then export scores to Reveal (if desired). Auto mode streamlines the process by automatically supplying training round documents on a timed basis as needed. It will also pull scores after each round if enabled.
To enable automatic training, click Enable automatic training. Enter the number of documents you wish to review for each round along with the method for selecting documents and how often to poll the Reveal API to update tagging progress.
Immediately after creating the CMML session and when auto mode is enabled, messages will appear occasionally asking to refresh the screen.
Each CMML session has a unique identifier that is also used within Reveal to allow for multiple sessions at the same time. For example:
When a CMML session is created in Brainspace, the following items are automatically created in Reveal.
CMML review team with admin access by default.
A unique suggested training field.
A unique tag profile to which the connected tag is added.
A unique score field.
A unique field profile with admin and CMML review team access by default. The training field, score field, and tag field are added to this profile.
Main CMML root work folder under the Brainspace root folder.
Unique classifier folder under the main CMML root folder with admin and CMML review team access by default.
A work folder with all documents used for training under the classifier root folder.
A suggested for training needing review work folder under the classifier root folder.
At the bottom of the CMML session, you can view the number of documents in the current round along with the count of documents coded. In auto mode, the Reveal API is polled for tagging information and after all documents in the training round are tagged, Brainspace will close the round, perform score calculation, and create a new training round.
To review the documents in Reveal, navigate to the suggested training folder under the classifier folder for the session and review the documents using the connected tag. You should also pick the field profile for the session. If Review is already open, you need to refresh your browser to reload the field and tag profiles. As always, you can create an assignment job if needed.
In manual mode, you can pull tags in Brainspace using following icon at the top of the session area.
Once training is complete, you can manually export scores to Reveal using the following icon at the bottom of the session area.
Session example after closing the first training round.
Scores in Reveal range from 0.00 to 1.00 with 1.00 being most responsive.
You can also create a control set if desired within a CMML session.
When a control set is created in Brainspace, the following items are automatically created in Reveal.
A work folder with all control set documents that need to be reviewed under the classifier root folder.
A unique field indicating the document is a control set member.
A unique tag profile, tag set, and choices used to review the control set documents.
A unique field profile with admin and CMML review team access by default. The control set member field, score field (from the original session creation), and control set tag field are added to this profile.
If Review is already open, you need to refresh your browser to reload the field and tag profiles.
Note the tag and field profiles for the control set review have CtrlSet in the name.
Ensure that you select these versions while reviewing control set documents.
You can use the following button to update the coded status in Brainspace during or after all control set documents are coded.
Once all documents are coded, the control set is processed.
If necessary, you can add additional documents to a control set after it is created using the Modify button.