Start a new topic
Answered

Deterministic or probabilistic matching

Does xDM support deterministic or probabilistic matching?


Best Answer

Semarchy xDM leverages both deterministic and probabilistic matching. 

Deterministic matching means "matching using rules." That can lead to some confusion as all xDM matching is based on rules. However, xDM rules can contain probabilistic algorithms for 'fuzzy matched' entities. Rules may includes either or both exact matching (name1 = name2) and fuzzy matching (phonetic(name1) = phonetic(name2)).

Fuzzy Matched Entities use a Matcher to automatically detect duplicates using fuzzy matching algorithms such as:

• Metaphone & Double Metaphone
• Soundex
• Edit Distance & Edit Distance Similarity
• Jaro Winker & Jaro Winkler Similarity
• NGRAMs
• Levenshtein
• etc.

Multiple Match Rules in a matcher allow you to define any number of conditions required for considering two records a match. Each condition has its own Matching Score. This score represents the percentage of confidence you put in a match that occurs based on the rule. Records are considered matched when the aggregate of all conditions exceed thresholds you define.

Matching Groups are created using matching transitivity. Matching Transitivity means:

If A matches B and B matches C, then A, B, and C are in the same matching group. Each matching group has a Confidence Score expressing the level of confidence across the group of matching records. This score is the average of the individual match scores in the group.


1 Comment

Answer

Semarchy xDM leverages both deterministic and probabilistic matching. 

Deterministic matching means "matching using rules." That can lead to some confusion as all xDM matching is based on rules. However, xDM rules can contain probabilistic algorithms for 'fuzzy matched' entities. Rules may includes either or both exact matching (name1 = name2) and fuzzy matching (phonetic(name1) = phonetic(name2)).

Fuzzy Matched Entities use a Matcher to automatically detect duplicates using fuzzy matching algorithms such as:

• Metaphone & Double Metaphone
• Soundex
• Edit Distance & Edit Distance Similarity
• Jaro Winker & Jaro Winkler Similarity
• NGRAMs
• Levenshtein
• etc.

Multiple Match Rules in a matcher allow you to define any number of conditions required for considering two records a match. Each condition has its own Matching Score. This score represents the percentage of confidence you put in a match that occurs based on the rule. Records are considered matched when the aggregate of all conditions exceed thresholds you define.

Matching Groups are created using matching transitivity. Matching Transitivity means:

If A matches B and B matches C, then A, B, and C are in the same matching group. Each matching group has a Confidence Score expressing the level of confidence across the group of matching records. This score is the average of the individual match scores in the group.


Login to post a comment