Question

Sequential restart of weblogic manage servers

In our project we have got Pega 7.1.7 deployed in WebLogic cluster having 9 managed servers (nodes).

Is it recommended to do a sequential restart or parallel restart of servers can be done without any issue ?

Past observation :- In few instances, it was observed that node level data pages, which loads on start up of each servers, got corrupted. Our assumption for this issue was that it might have happened due to parallel restart, hence, we have been doing sequential restarts since then. We have never observed any issue after we are doing sequential restarts of nodes.

Comments

Keep up to date on this post and subscribe to comments

Pega
July 17, 2019 - 10:26am

Hi,

It is always good to do a node by node restart.

Thanks,
Abhinav

July 18, 2019 - 8:37am
Response to Abhinav7

Thanks Abhinav for your response. Is there any specific reason as to why this is recommended and whether it has to do anything with node level data pages population on start-up of each nodes.

Pega
July 18, 2019 - 9:14am
Response to DiscoverS

Hi,

Pega uses hazelcast,It is a clustering technology.Whenever server restarts then it forms a cluster.There will be a main node which forms a cluster and other nodes try to join this cluster.This will not only help for performance but also during elastic search.You can check the startup logs to get better understanding how cluster forms.

Thanks,
Abhinav

July 26, 2019 - 5:22am
Response to Abhinav7

Thanks Abhinav for your insight. Can you please tell if there will be any issue with node level data pages loading if we perform below two activities -

1. First thing is to remove al nodes from Load balancer before restart

2. Then restart all nodes in parallel.

3. Putting all nodes back to LB.

We are seeking this information as our project demands a very high availability environment setup and there is a plan to horizontally scale the nodes from 9 to 13 in coming days. Sequential restart will take a lot of time, therefore, we wanted to check on the impact of doing parallel restart in the node level data pages which loads on start up.

One more question regarding a particular scenario where one of the node is having stuck thread/memory issue then could we follow this -

  1. Take the problematic node off the LB , having rest of the node active on LB so they are available to serve user requests

  2. Restart the application server on problematic node (which includes refreshing node level pages on the respective node)

  3. Put the node back on the LB

What we would like to understand is when we do Step 2 in the process above if this would corrupt node level cache on other active nodes in LB.

 

Thanks,

Aamir

August 19, 2019 - 10:04am
Response to Abhinav7

Hi Abhinav,

I would like to know whether Pega Hazelcast has anything to do with Node Level Data Pages. Kindly refer to the query that it was a general observation, right after parallel restarts the Node Level Data Pages were getting corrupted at the start up of the server node, but the risk was seemingly reduced when sequential restart of nodes was performed. We understand that Pega uses Hazelcast for Search but any direct impact on node level data pages due to Hazelcast is not known to us. Hence, the query in regards to Data Pages functioning after parallel restart remains to be clarified. If the Data Pages refresh at node start up also depend on Hazelcast cluster, then we can understand the suggestion of sequential restart. Kindly confirm this once.

 

Regards

Anand

Pega
August 20, 2019 - 12:51pm
Response to AnandS24

Hi Anand,

I don't think node level data page is related to hazelcast.Hazelcast is a clustering technology whereas as far as I know node level data page is a normal datapage which is accessible only by the requestors of a particular node
 
 Did you verify startup logs during parallel restart which made datapages corrupt.Please compare start up logs of both restarts.You will find difference.
 
 Thanks,
 Abhinav