How to Build Real-Time Data Streaming Pipeline for Pega Applications

We have requirements to build real-time data streaming pipeline using CDC and Kafka for our application built on Pega platform. The real-time data streaming pipeline will be used to feed analytical database, data warehouse, and data lake, etc. The question is how to extract the data out of pzPVStream blob in this use case. Where and how to add BIX extract to the pipeline? How to make the streaming real time with BIX extract? Are there other ways to extract pzPVStream blob?

***Edited by Moderator Marissa to update platform capability tags****


Keep up to date on this post and subscribe to comments

October 16, 2019 - 2:36pm

Extracting data from the blob is done using the pr_read_from_stream() OOTB UDFs (user defined functions in the database). These are inefficient, and tend to cause performance issues when used extensively.

Why would you be extracting data from the blob? Is that how the data is being deliveredin your data stream?

October 16, 2019 - 8:09pm

Thanks for your reply. There are only few fields exposed in our transactional database work table, most of the fields are not exposed and stay in pzPVStream blob. The destination of the pipeline such as analytical database, data warehouse, and data lake would want every fields listed in work table, not in blob format.