This article covers a problem where the data location appears to have a backlog of many Scheduled jobs that never run AND the behavior that seemingly random integration jobs successfully complete but out of order.
To clarify, this is an example of how the data location logs might look.
Notice there are 2 Scheduled integration jobs in the queue. Additionally, there are many completed integration jobs with the status Done. But we can't predict whether the next integration job will be Scheduled or if it will successfully process and show as Done.
The situation above is caused by having more than 1 active/running Batch Poller.
The batch poller is responsible for checking for newly loaded records and queuing them for the Execution Engine to run in an integration job. When you have more than 1 running batch poller, the batch pollers don't know how to coordinate. They submit jobs to the Execution Engine that have the Scheduled status, but these jobs do not ever run. It also explains the random behavior where some integration jobs will complete successfully, but you can never guess which one.
This error behavior is a major problem because the scheduled integration jobs never run to process your loaded data. When the integration jobs stay in the Scheduled status, data you loaded doesn't have a chance to become golden records.
It further becomes a problem because you can have a backlog of Scheduled jobs. The order that data is processed in also matters if you have data that is dependent on previous integration jobs.
Why does this happen?
These are the most common reasons for multiple batch pollers:
- User error during installation of xDM (for example, installing 2 active nodes)
- Errant application server behavior (for example, Tomcat did not fully shut down last time the server was rebooted or you have a fantom runaway Tomcat process running)
- Unexpected user error (for example, someone forgot they have installed xDM on their laptop and it is hitting the same Semarchy repository and data location)
You must find and terminate one of the batch pollers that is running. The end result must be that you only have 1 active batch poller.
It is easy to know this solution but finding the other active batch poller can be difficult and mysterious. It comes down to following the troubleshooting steps listed.
The most common cause of multiple batch pollers is due to installing more than 1 active node. If you are configuring a High-Availability Architecture, check to ensure you only have 1 active node. If you have more than 1 active node of xDM running, keep 1 active node and change the other active nodes into passive nodes.
If you don't have High-Availability configured, here are the other steps to try for finding the other active batch poller:
- Run the Task Manager (on Windows) or
top(on Linux). How many JAVA processes do you see running? If you shut down Tomcat, do you have a different number of JAVA processes running? Is there still another JAVA process? If you have more than one JAVA process, kill the JAVA processes and reboot.
- A different and less direct way to investigate is to look at the sessions connected to your database. Can we see what are the sessions with active connections? How many sessions do you have? Shut down Tomcat. How many active sessions do you have now?
- If it's still not working, shut down Tomcat/the application server. Reboot the server.
- If the above steps still didn't work, cancel/kill all integration jobs that haven't completed to start a clean slate. Stop the batch poller. Change the semarchy_repository database/schema password. Update the semarchy.xml to use the new repository password.
NOTE: After trying each step listed above (do them one-by-one), load data and observe the behavior of the integration jobs.
If you still cannot resolve the problem, contact Semarchy Support and include the results from the above troubleshooting steps.