Relationship Between SDGs and EDGs

A duplicate detection algorithm is run on all non-error documents to create the SDGs and EDGs. The difference between SDD and EDD is only in which fields are used by the algorithm.

If an SDG is created, that means an EDG will be created as well. If two documents are members of the same SDG, they will always also be members of the same EDG. Because EDGs use looser criteria for duplication, however, an EDG may contain documents from multiple SDGs, as well as documents that are not members of SDGs.

If an EDG contains documents from one or more SDGs, then the pivot of the EDG will be the pivot of one of the SDGs (rather than an SDG duplicate or unique). Figures 1 to 3 show examples of relationships among SDGs and EDGs.