brs_dup_type and the Candy Bar
In addition to the metadata fields discussed above, Brainspace also produces an 8-valued metadata field called brs_dup_type that combines information about how a document was treated by EDD and NDD, but also gives special treatment to documents that end up in the Excluded cluster in the cluster hierarchy.
Here are the 8 values of brs_dup_type, and what each of them tells us about a document:
brs_dup_type | is doc in Excluded cluster | is doc clusterable? | What Did EDD Discover About The Doc? | status: exact | What Did NDD Discover About the Doc? | status: near | candy bar |
---|---|---|---|---|---|---|---|
unique | no | yes | Doc had no exact dupes. | unique | Doc had no near dupes. | unique | Originals |
exactorig | no | yes | Doc had exact dupes, and became pivot of its EDG. | pivot | Doc had no near dupes. | unique | Originals |
exactorignearorig | no | yes | Doc had exact dupes, and became pivot of its EDG. | pivot | Doc had near dupes, and became pivot of its NDG. | pivot | Originals |
nearorig | no | yes | Doc had no exact dupes. | unique | Doc had near dupes, and became pivot of its NDG. | pivot | Originals |
neardup | no | yes or no | Doc had no exact dupes. | unique | Doc had near dupes, and became a duplicate in its NDG. | duplicate | Near Duplicates |
exactorigneardup | no | yes or no | Doc had exact dupes, and became pivot of its EDG. | pivot | Doc had near dupes, and became a duplicate in its NDG. | duplicate | Near Duplicates |
exactdup | no | yes or no | Doc had exact dupes, and became a duplicate in an EDG. | duplicate | Document was not input to NDD. | unique | Exact Duplicates |
excluded | yes | yes or no | Varies. Document may or may not have been input to EDD. | any | Varies. Document may or may not have been input to NDD. | any | Not Analyzed |