Skip to main content

Brainspace

Relationship Between NDGs and EDGs

The NDD algorithm is run after the SDD/EDD algorithm. It is run only on documents that have brs_exact_dup_status = pivot or unique.

The relationship between NDGs and EDGs is therefore different than the relationship between EDGs and SDGs:

  • The existence of an EDG containing particular documents does not mean there is any NDG containing any of those documents.

  • When an NDG does contain documents from one or more EDGs, it at most contains the pivots from those EDGs, not the duplicates.

Figure 4 shows the possible relationships between NDGs and EDGs, and the possible values of NDD and EDD status fields for documents input to NDD.

Exact_And_Near_Duplicate_Figure_4.png
  • A dataset with four EDGs and three NDGs. 

  • As shown, an NDG may contain zero, one, or several EDG pivots.  If the NDG contains at least one EDG pivot, then the NDG pivot will be one of the EDG pivots.

  • Values for the exact duplicate and near duplicate status fields are shown for documents participating in NDD.