Advanced Technical Troubleshooting and Optimization for Slow Matching Performance : Portal

Matching performance is a complex issue. There could be many causes that lead to unsatisfactory integration job processing times, including poorly optimized match rules that perform too many fuzzy-matching comparisons or logic bugs that ensure your integration jobs will simply never finish.

Assuming your data volumes are large and you've got adequate hardware allocated to xDM (memory, storage specs for the application server and database), if you've optimized and feel confident that there's no more to be gained by tweaking match rules, here are steps you can try to speed up your integration jobs.

These steps are very technical and advanced. They require help from team members with DBA (to add index and modify queries in the database) and system administrators (to modify settings in the application server) privileges.

Use Case

Here is an example of how an integration job might seem stuck and what kind of situation the steps in this article might help.

Our integration batch is stuck processing the "Compute Transitive Closure for System Dups" step. We tried restarting the database, the Tomcat app server, re-indexed the tables for the model, and restarted the batch with no success.

This indicates that low-hanging fruit has been attempted and hasn't fixed the slow performance.

The database activity is at 5%, read IOPS is at around 3k, and burst balance is at 100%. We're running our database in RDS PostgreSQL with 1.2TB in size with tons of free space.

There is adequate capacity in the physical hardware. What's next?

Troubleshooting

Step 1: Make sure there aren't giant clusters.

If the transitive closure is longer than usual, chances are that the matching detected too many matches. Did you add a new match rule recently?

This step ensures you really don't have badly performing match rules. It also ensures you didn't load an incorrect data set which is causing huge golden clusters.

Here is a sample query to run.

select STD_PRIMARY_PHONE, count(*) bin_size 
 from MI_CONTACT 
 group by STD_PRIMARY_PHONE 
 order by 2 desc

You will need to modify the above query to use the MI table that the troublesome match rule is stuck on and replace the STD_PRIMARY_PHONE with the attributes that you are matching on.

If you're sure it's not a badly performing match rule, proceed on.

Step 2: Add new Java option in the app server.

Instead of the default cursor-based implementation for the transitive closure, use a more set-based efficient mode that can be enabled with a Java option.

To enable this UPDATE_WHILE mode, you’ll have to add a new Java option in the CATALINA_OPTS as follows. JVM options are typically set in the setenv.sh file (on a preconfigured instance), or in the file that defines the memory settings (on a typical EC2 Instance). If using the Semarchy-supported cloud VMs on Azure and AWS, those memory settings will be in the tomcat9 configuration file, found at /etc/default/tomcat9.

Here is what you need to set the Java options to:

CATALINA_OPTS="$CATALINA_OPTS -Dmdm.integjob.defaultTransitiveClosureMode=UPDATE_WHILE"

Using the new transitive closure version will require restarting the application server (at least the active node, if you have a High-Availability architecture), and a new model deployment (check this on a DEV environment first) to re-generate the job definitions.

Step 3: DBA tasks to update database to perform efficiently.

Update your MI tables to make sure you provide accurate information on the cardinalities and indexes selectivity for the main columns used in the transitive closure (default histogram size is 100, which is way too small).
Decrease the fill factor to allow HOT updates on MI_{entity} table: it’s a faster update, only possible when there’s room for the new record version in the same physical page
Decrease fill factor on I3MI_{entity} to speed up updates on B_MATCHGRP
Run a vacuum analyze on the table in order to reorganize it, apply the new fill factor, and gather some fresh statistics with the new histogram size
Enhance statistics and run a vacuum analyze on DU_{entity}

How do you complete the steps above?

Assuming the entity name is Account, the SQL to perform the above steps are:

-- SET STATISTICS TARGET on MI_ACCOUNT 
 alter table mi_account alter column b_batchid set statistics 10000; 
 -- 10000 is the maximum value for histogram size
 alter table mi_account alter column b_matchgrp set statistics 10000;
 alter table mi_account alter column b_sourceid set statistics 10000;
 alter table mi_account alter column b_xgrp set statistics 10000; 
 analyze mi_account;
 -- for b_pubid the default value (100) is sufficient in most cases
 alter table mi_account alter column account_id set statistics 10000; -- use the golden_id physical name
 alter table mi_account alter column b_confirmedsdpk set statistics 10000;
 --vacuum is required here to reorganize the table and take the new fill factor into account
 alter table mi_account set (fillfactor = 80);
 vacuum analyze mi_account;  
 

 -- update fill factor on index as well (B_MATCHGRP updated by transitive closure)
 alter index i3mi_account set (fillfactor = 80); 
 reindex index i3mi_account ;
 

 -- SET STATISTICS TARGET on DU_ACCOUNT  
 alter table du_account alter column b_sourceid set statistics 10000;
 alter table du_account alter column b_sourceid2 set statistics 10000;
 analyze du_account;

The enhanced statistics / increased histogram size will increase the stats gathering steps duration to help the optimizer pick the best execution plan.

NOTE: Please check that the job you’re using to process these incremental loads have the job parameter PARAM_ANALYZE_STATS set to 0.

Advanced Technical Troubleshooting and Optimization for Slow Matching Performance Print

Use Case

Troubleshooting

Related Articles