Discussion

Ask the Expert - Pega Predictive Diagnostic Cloud with Andy Werden [2019 Edition]

Join Andy Werden (@WERDA) in this Ask the Expert session (23rd-27th September) on Pega Predictive Diagnostic Cloud.

Meet Andy Werden: Andy is a seasoned subject matter expert on PRPC performance, operations and scalability. Prior to joining the product management team in 2017, he managed and lead the technical consulting practice for the Americas. In his spare time he is a really slow runner, a bad biker, a seriously sad swimmer and a terrible triathlete. Andy is an exiled New Yorker who lives as a pseudo-southerner in Chapel Hill NC, where he pretends to passionately follow college basketball (Go Tarheels!).

Message from Andy Werden: Happy to have the opportunity to help you use PDC to keep tabs on system performance and quality and drive you continuous improvement governance process. I look forward to getting feedback on how PDC is working for you and how we can continue to make the PDC service better.

Ask the Expert Rules

  • Follow the Pega Support Community's Community Rules of Engagement
  • This is not a Live Chat - Andy will reply to your questions over the course of the week (23-27 September)
  • Questions should be clearly and succinctly expressed
  • Questions should be of interest to many others in the audience
  • Have fun!

Group Tags

Comments

Keep up to date on this post and subscribe to comments

September 19, 2019 - 4:09pm

I thank pega support for these expert sessions. My queries are on PDC.

1. can you share doc or diagram of  PDC architecture.

2. When the data is send to PDC , what is the data retention policy offered by pega PDC. Does it have any archival strategy.

3. Does the PDC SOAP calls have OAuth or any other any other access token for the authorization when we send the data to PDC.

Much Appreciated!

 

 

Pega
September 20, 2019 - 1:42am
Response to KavithaEthirajalu

2. PDC retains alerts, exceptions and data derived from alerts & exceptions for 14 days. PDC retains node health, query stats, table stats, usage, index definitions and elastic search status for 30 days. Action cases are retained for 95 days after resolution, unless they are Resolved-Ignore, which are retained forever, There is no archiving in PDC other than implicit archiving from AWS RDS snapshots - by design and intent the goal is to identify and resolve current and recent issues.  I could see a use case for keeping usage / aggregate performance and case snapshots for a much longer period of time; would like to know what you would like to do with older data.

3. SOAP/REST messages sent to PDC use HTTPS with TLS 1.2. Since communication is one-way - PDC receives but neither calls nor reveals - we have minimized latency by using an "open trust" model.  Working on design to enhance messages to 'self-authenticate' to prevent nuisance attach / noise. 

1. PDC is currently deployed with standard PegaCloud architecture - three pools of auto-scaled application servers (batch/search/bix, stream, and web user/service), PostgresRDS database server and an ALB load balancer. Browser and message traffic comes in over load balancer over HTTPS. Inbound data is stored in real time; case processing and analysis runs through agents. Will find a picture to post; let me know if there are specific questions to answer.

Pega
September 20, 2019 - 1:45am
Response to WERDA

sketch attached

September 20, 2019 - 10:48am
Response to WERDA

Much Appreciated!. 

September 21, 2019 - 3:23am

September 23, 2019 - 8:21am

Thanks for organizing a session like this.

I have a question about evaluating the system performance from the data accumulated in PDC.

As the development team is identifying and resolving alerts, what are the metrics we can monitor in the system, or reports which would visualize the improvement achieved because of alert resolution.

Sorry for being vague in my question but we have started they journey of assimilating PDC in our day to day development work and would appreciate in any feedback on this question, or best practices to follow.

Thanks,

Shuvadeep

Pega
September 24, 2019 - 5:56am

As a broad statement, our goals are always to improve system response and reduce the exception counts.  The System Assessment and Usage Viewer give good information on performance. I personally use the improvement plan to identify and prioritize exceptions that need investigation.  There's not at present a good OOTB report to that shows total exception count by day but it was easy to go to custom reports and knock out a nice chart; will need to add that in an upcoming build.

So - the goals are simple:

- reduce average interaction response time
- reduce the exception count
- increase number of users
- increase interactions (use of system) 

 

 

 

September 24, 2019 - 1:46pm

Hello! 

I have several questions on the PDC. They are:

1. Why are the operators by System set to 0?  I have added atleast 1 user for serveral systems, but it displays 0. See PDC OperatorsBySystem.pdf for an example.

2. On the Set System URLs tab, do the URLs need set?  Since there is a one way communication, I don't see why the URLs need set. 

3. There are default users setup that I did not add, such as - lewcn, nowam1, tyrka, tyrkm. Can I delete them?

Thanks!

Fred.

 

Pega
September 25, 2019 - 1:26am
Response to fredgoldbach

3. The 'default' users you see are members of the PDC team. Feel free to delete them. Will look into source later - they should not be there.

2. You don't need to set the URL's. That screen is left over from AES on-premise code that became PDC. I don't believe that the URL's are actually used in AES or PDC.

1. The operator by system report looks like a bug. It's a bit complicated - and we changed the model from implicit denial to explicit denial without rethinking the report. By default a PDC user can access all monitored systems unless they are explicitly blocked. Are you using the option to block users from specific systems?

Thanks for 3 bugs to fix

September 24, 2019 - 2:27pm

Hi Andy, 

When you click on Enterprise, the list of monitored applications display.  Then, you click on a system and you can see the Health, DB Query Stats, Database Usage tab, etc. 

On the Health tab for the system details, the system hash value is displayed. It is hard to relate back to the specific system that is passing the data.  Is there a way that the Node Short Description can be displayed instead of the hash value? 

Thanks!

Fred.

 

 

Pega
September 25, 2019 - 1:53am
Response to fredgoldbach

PDC displays the NodeID.  With Pega 7.3+ you can set the nodeID using JVM startup arguement -Didentification.nodeid. The nodeID will be generated using a hash of host name, system name and fully qualified temp path if nodeID is not specified. We switched from using node short description to nodeID as customers have generally changed deployment models to dynamic clouds and virtual servers, and there might be a different host or node on each and every restart. As node description is sent to PDC at startup only, its a lot of work to go in and edit data-admin-nodes to set description then restart all the time, and its easier to use use automation scripts to pass a meaningful identifier. 

Would setting nodeID with meaningful value be a viable option for you? 

Certainly technically possible for us to add an option to flip the display from nodeID to the legacy short description field; just need a better sense of whether that is really needed. 

September 25, 2019 - 11:03am
Response to WERDA

Thank you!!  I was able to set the nodeid on my Pega8.3 sandbox and it worked. The node name and node id were both updated - so there is no hash value. I also looked at the Search node and it reflected the node id value. This is convenient so you know where the search index lives. No more translating the hash value to the node id. Setting the node id on the jvm will save configuration time when configuring an environment. 

Now, I do have some legacy systems (ie. 6.x and 7.1.8), which have multiple JVMs. I guess we can translate the hash value until they're upgraded. 

September 26, 2019 - 11:00am

Hi Andy, 

I have a couple more questions on PDC. 

1. Are there any plans to integrate with HP Service Manager? 

2. When looking at the system details, there is a place to add the SMA URL. I'm assuming that is a leftover from AES. Is that correct?

Thanks!

 

Pega
September 26, 2019 - 1:56pm
Response to fredgoldbach

No plans to integrate with HP Service Manager at present. This is the first time I've heard of that product actually. We currently have an 'on demand' interface to create a 'case' from a PDC action case  - click the 'share' button to push data to create story, bug or incident in Jira, Agile Studio, Service Now or Pager Duty. Will have new notification options in November - create PagerDuty, Service Now, Jira events instead of or in addition to emails and SMS messages

 

 

September 26, 2019 - 4:02pm

Hi Andy, 

I have added a Pega 7.1.8 system to the PDC. I have applied HFIX-26497. I have restart the system after the hotfix was applied. I have 2 questions. 

1. The Production level is set to 0.  Is there something else that I need to do for the production level to show? 

2. The 718 system I have added has 2 jvms which are horizontally clustered. Only one JVM shows in the Enterprise listing.  Is there something that I can check/modify to see why only one JVM is listed? 

Thanks!

September 27, 2019 - 9:28am
Response to fredgoldbach

OK. After some trial and error, I was able to get both jvms to display in the PDC.  I had an entry in the DSS that needed to be removed - It was the nodeid:SOAPAppender and the value was true. I removed the DSS, restarted the jvm and now it shows in the PDC.  

The Production Level is still showing as 0. 

Thanks!

Pega
September 28, 2019 - 4:40pm
Response to fredgoldbach

Production level is sent to PDC as part of system startup message. See what happens after you restart one of the nodes.

September 27, 2019 - 1:44am

Hi Andy, 

Thanks for answering my previous query. I have a few more questions around a PDC case and its lifecycle.

1. I have observed that the priority field of an alert case gets calculated based on the number of events and frequency of those events. If number of events are high the priority of that case is also high. But I have also seen that even if for some alert cases the number of alert occurrences is 0 over a period of time (2 or more days), the associated priority of the case is not reduced. Can you explain the reason behind it ?

2. Some alert cases are resolved automatically with status - Resolved-NotSeen status if the alert is not seen for a number of days. Can you tell what is that duration and is it dynamically chosen based on case to case basis. 

 

3. What are the possible status of a PDC case can have and when those status are assigned, till resolution. What are the different resolution status can we have. Simply put a case life cycle diagram of an alert case would help along with the status at specific stages or steps.

 

Thanks

Shuvadeep 

Pega
September 28, 2019 - 5:02pm
Response to ShuvadeepD

Thanks for your interest. Priority of a case is calculated using weekly impact - a factor of # of occurrences and duration (for time based alerts) ranked over all cases that have had impact this week. The same # of occurences / time elapsed of a given case will prioritize differently depending on frequency and impact of other cases.

Cases go to status Resolved-NotSeen if no events are seen for 12 days. Cases re-open if the issue re-occurs within 95 days of resolution.

Case lifeCycle in PDC is pretty crude - New --> Open --> Resolved, Resolved Not Seen, Resolved Ignore or Open Elevated KPI
It needs work but we found that most people take the cases and work them in a different system , with Agile Studio, Service Now & Jira being the most popular. So instead of developing an elaborate lifecycle in PDC, we 'share' the case to the project management system and basically view that the "work" status is property of the developers work management system, while the "impact" status is in PDC - it's either happening (open) or its fixed (resolved) or ignored (resolved-ignore). Not uncommon for an issue to be declared 'resolved' in Jira but not in production yet, so its still Open in PDC until the problem is really fixed.

 

 

September 27, 2019 - 7:42am

Hi Andy,

thank you for the support, i can't find instructions on how to install PDC on legacy system (7.1.9) and hotfix to apply and also if is fully compatible with last PDC version.

Thank you

Best Regards,

Andrea

Pega
September 28, 2019 - 4:54pm
Response to AndreaG86

Technically, you don't install PDC. PDC is a service. You configure your systems to send data to the PDC service. PDC integration instructions have been the same starting with 7.17

1- log into PDC .

Every customer should have at least one PDC tenant, regardless if you are using Pega on PegaCloud, premise or a private cloud. 
Easiest way to access PDC is to use our default integration with the pega community.

1a. Go to https://community.pega.com/support
1b. Scroll to the navigation bar with "submit/manage support requests" and "pega diagnostic cloud" links.
1c. verify that you are an authorized support contact - click on support requests. If you can see and create support tickets, by default you will be authorized to use PDC. Contact your support administrator if you need support ticket access.
1d. Click on the PDC link. If you have more than one PDC Tenant (one on 'external' for premise system, a separate PDC tenant for each PegaCloud contract) you will get a menu; if you have one PDC tenant you navigate directly. By default PDC operator ID will be created on the fly.

2- On the PDC Welcome page, click the link to copy your URL

3- In your system, navigate to System / Settings / PDC and paste and save the SOAPServlet integration URL.

If your system is in a data center, you will probably need to engage your middleware team to configure your system with proxy server access to forward messages to PDC via public internet. Proxy server setup varies by app server and network configuration, and is no different for PDC than for any internet-based service, so we leave the proxy bit up to your company's SME's. 

We enhanced the PegaAESRemote ruleset to send PDC more information about system configuration, health and state. The PegaAESREmote rules are available via hfix for 7.21 to 7.4 and are part of the Pega8 service packs. We last changed those rules in June 2019 but are planning some additional enhancements in December. We plan to offer enhanced rules for Pega8 independent of service packs by uploading them as components on Pega Exchange.  At present there is no plan to back-port the work to 7.19.

Mod
September 30, 2019 - 3:50am

Thank you for the great discussions! Thank you to Andy for being a great expert!!

Please continue asking your questions on PDC by creating new posts.

Lochana | Community Moderator | Pegasystems Inc.