Relationship Between NDGs and EDGs
The NDD algorithm is run after the SDD/EDD algorithm. It is run only on documents that have brs_exact_dup_status = pivot or unique.
The relationship between NDGs and EDGs is therefore different than the relationship between EDGs and SDGs:
The existence of an EDG containing particular documents does not mean there is any NDG containing any of those documents.
When an NDG does contain documents from one or more EDGs, it at most contains the pivots from those EDGs, not the duplicates.
Figure 4 shows the possible relationships between NDGs and EDGs, and the possible values of NDD and EDD status fields for documents input to NDD.
A dataset with four EDGs and three NDGs.
As shown, an NDG may contain zero, one, or several EDG pivots. If the NDG contains at least one EDG pivot, then the NDG pivot will be one of the EDG pivots.
Values for the exact duplicate and near duplicate status fields are shown for documents participating in NDD.