This article addresses performance issues encountered in Semarchy xDM during the 'Find and Match Duplicate' step within integration jobs, where matching rules cause extended delays in processing. This guide provides solutions to optimize match rules and improve processing times.
Issue
Customers may experience delays in their integration jobs, particularly during the 'Find and Match Duplicate' step, where even small batches of records (1-5) may take over an hour to process. This issue may arise from inefficient use of functions within match conditions and the absence of optimized indexing.
Solution
Improving match rules performance involves optimizing binning and matching conditions by removing functions, creating enriched calculated attributes, and adding indexes to the database. This allows for more efficient matching and indexing strategies.
Detailed Steps for Resolution
Optimize Binning Conditions
- Avoid Functions in Binning Conditions: Functions within binning conditions can slow down performance significantly.
- Solution:
- Create a calculated attribute that captures your current binning condition expression without functions.
- Use this new calculated attribute as the binning field in the match rule instead of the original calculation.
Optimize Matching Conditions
- Avoid Functions Inside Match Rule Conditions: Functions like
trim
orupper
should not be used directly in match rule conditions, as they can degrade performance. - Solution:
- Create or update attributes with the necessary transformations (e.g., trimmed and/or uppercase versions) and store them as calculated attributes.
- Use these newly created attributes in your match rule conditions instead of function calls within the rules.
- Avoid Functions Inside Match Rule Conditions: Functions like
Update Existing Records with New Enrichers
- Objective: To avoid re-importing all records, perform a manual update on the existing records directly in the MI table.
- Solution:
- Extract the SQL code for the enrichers from your task definitions.
- Execute this SQL on the relevant
MI_
table to update existing records so they match the newly created attributes.
- Note: This approach ensures that the next load will enrich incoming records and compare them against updated attributes in the MI table.
Add Indexes for Matching Attributes
- Adding indexes can enhance the efficiency of matching tasks significantly, especially on attributes frequently used in match rules.
- Suggested Indexes:
CREATE INDEX usr_1 ON mi_m_health_prfssnl_hprf (hprf_full_gbl_nm_trim, b_classname, b_pubid, b_sourceid);