Question

Tomcat server hangs after asynchronous processing starts

This is for a fresh installation of Pega 8.2.2

During deployment of the WAR file on Tomcat, (on server startup), I see the following log message and the Tomcat server gets stuck at this point:

2019-07-11 02:16:51,101 [groundProcessing]:14] [ STANDARD] [ ] [ ] (.DataFlowDiagnosticsFileLogger) INFO - {"type":".ProcessingThreadLifecycleMessage","senderNodeId":"e1ace989de0e382529169b7cd9308ff7","timestamp":1562825810673,"runId":"pyBatchIndexClassesProcessor","threadName":"DataFlow-Service-PickingupRun-pyBatchIndexClassesProcessor:36, Access group: [PRPC:AsyncProcessor]","event":"PICKED_UP_PARTITIONS","partitions":["11","7","1","10"]}
2019-07-11 02:16:51,109 [groundProcessing]:14] [ STANDARD] [ ] [ ] (.DataFlowDiagnosticsFileLogger) INFO - {"type":".RunStatusTransitionMessage","senderNodeId":"e1ace989de0e382529169b7cd9308ff7","timestamp":1562825810628,"runId":"pyBatchIndexClassesProcessor","previousStatus":"RESUMING","newStatus":"IN_PROGRESS","originator":"MultiplePartitionExecution","reason":"Run is in progress"}
2019-07-11 02:16:51,121 [groundProcessing]:14] [ STANDARD] [ ] [ ] (.DataFlowDiagnosticsFileLogger) INFO - {"type":".ProcessingThreadLifecycleMessage","senderNodeId":"e1ace989de0e382529169b7cd9308ff7","timestamp":1562825810750,"runId":"pyBatchIndexClassesProcessor","threadName":"DataFlow-Service-PickingupRun-pyBatchIndexClassesProcessor:35, Access group: [PRPC:AsyncProcessor]","event":"PICKED_UP_PARTITIONS","partitions":["19","9","12","3"]}

Even after 24 hours, there is no progress from this point and the application is also unaccessible.

***Edited by Moderator: Lochan to update platform capability tags***

Comments

Keep up to date on this post and subscribe to comments

July 11, 2019 - 6:34am

These are background processing info logs, normally this happens after engine already startup. Can you attach the complete startup log for review?

Pega
July 11, 2019 - 9:24am

Also, provide your Tomcat's context.xml

July 25, 2019 - 9:01am

I am getting the same issue with 8.2.1. I am attaching the PegaRULES log.

The effect of this issue is I keep getting "Save failed - Unable to create topic configuration" error while saving rules and the overall system is slow.
 

The stack trace is-
 

2019-07-25 08:51:30,576 [CHEDULER_THREAD_POOL] [  STANDARD] [                    ] [                    ] (ueueProcessorFailedRunsManager) ERROR   - Unable to re-start failed queue processor pyBatchIndexProcessor. Caught exception: FAILED to activate run, marking run as failed

com.pega.dsm.dnode.api.dataflow.service.DataFlowActivationException: FAILED to activate run, marking run as failed

at com.pega.dsm.dnode.impl.dataflow.service.DataFlowServiceImpl.activate(DataFlowServiceImpl.java:388) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.impl.dataflow.service.DataFlowServiceImpl.resubmitFailed(DataFlowServiceImpl.java:517) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.impl.dataflow.service.DataFlowServiceProxy.resubmitFailed(DataFlowServiceProxy.java:131) ~[d-node-8.2.1-225.jar:?]

.

.

.

Caused by: com.pega.dsm.dnode.impl.dataset.kafka.KafkaRuntimeConfigurationException: Unable to create or update topic PYBATCHINDEXPROCESSOR

at com.pega.dsm.kafka.impl.KafkaInstanceImpl.validateTopic(KafkaInstanceImpl.java:168) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.kafka.api.KafkaInstanceCache$KafkaInstanceProxy.validateTopicsOnInstance(KafkaInstanceCache.java:133) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.kafka.api.KafkaInstanceCache$KafkaInstanceProxy.getProducer(KafkaInstanceCache.java:107) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.impl.dataset.kafka.features.KafkaPartitioningFeature$1.emit(KafkaPartitioningFeature.java:124) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:338) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.impl.stream.DataObservableImpl.subscribe(DataObservableImpl.ja

.

.

.

Caused by: com.pega.pegarules.pub.PRRuntimeException: Unable to create topic configuration

at com.pega.dsm.kafka.impl.KafkaSchema.createTopic(KafkaSchema.java:112) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.kafka.impl.KafkaSchema.access$200(KafkaSchema.java:32) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.kafka.impl.KafkaSchema$1.execute(KafkaSchema.java:56) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.util.OperationWithLock$1.execute(OperationWithLock.java:64) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.util.OperationWithLock$LockingOperation.couldAcquireLock(OperationWithLock.java:163) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.util.OperationWithLock$LockingOperation.performLockOperation(OperationWithLock.java:130) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.util.OperationWithLock$LockingOperation.access$200(OperationWithLock.java:75) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.dnode.util.OperationWithLock.doWithLock(OperationWithLock.java:72) ~[d-node-8.2.1-225.jar:?]

.

.

.

Caused by: java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.

at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45) ~[kafka-clients-0.11.0.3.jar:?]

at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32) ~[kafka-clients-0.11.0.3.jar:?]

at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89) ~[kafka-clients-0.11.0.3.jar:?]

at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:213) ~[kafka-clients-0.11.0.3.jar:?]

at com.pega.dsm.kafka.impl.KafkaSchema.createTopic(KafkaSchema.java:106) ~[d-node-8.2.1-225.jar:?]

at com.pega.dsm.kafka.impl.KafkaSchema.access$200(KafkaSchema.java:32) ~[d-node-8.2.1-225.jar:?]

.

.

.

Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment.

2019-07-25 08:51:30,597 [kgroundProcessing]:3] [  STANDARD] [                    ] [                    ] (.DataFlowDiagnosticsFileLogger) INFO    - {"type":".PostExecutionMessage","senderNodeId":"262d0e093042c02fbcf2a9891dbbf011","timestamp":1564059090531,"runId":"pyBatchIndexProcessor","status":"FAILED","currentCluster":[{"address":"10.200.110.43","uuid":"c56bf24e-10e8-494d-99e5-6d9abdb0ed6d","prpcId":"262d0e093042c02fbcf2a9891dbbf011","state":"NORMAL","serverMode":true,"externalNode":false,"inetAddress":"10.200.110.43"}],"serviceInstance":"BackgroundProcessing"}

 

August 6, 2019 - 3:20pm

Curious as to whether a solution was found?

October 7, 2019 - 5:58am

After upgrade to 8.2.3 we are also seeing similar log messages. More surprisingly we are not using kafka.