Question

how to use Hive to retrieve data from Hadoop Distributed File System (HDFS)

We have business data stored in Parquet files on Hadoop Distributed File System (HDFS). Our Data Solutions team asks us to use Hive (HiveQL) to retrieve data. Does PRPC support using HiveQL to retrieve data from HDFS? If yes, can you please provide related technical/help documents on how to do it? If not, what is the approach that Pega recommends? We are using Pega Platform 8.2.3

***Edited by Moderator: Lochan to tag SR***

Group Tags

Correct Answer
September 6, 2019 - 11:00am

We dont support HiveQL. However if you have your data in the form of parquet files in HDFS , you can use the HDFS Dataset to map and access the data. Please refer to the links below for more information,

https://community.pega.com/knowledgebase/release-note/hdfs-data-sets-support-parquet-files

https://community1.pega.com/sites/pdn.pega.com/files/help_v81/procomhelpmain.htm#rule-/rule-decision-/rule-decision-dataset/dataset-hdfs-creating-tsk.htm

 

Comments

Keep up to date on this post and subscribe to comments

September 6, 2019 - 9:02am

I have raised SR-D43876 for the same question.

September 6, 2019 - 10:27am
Response to DavidL07

I forwarded the question to the SMEs, waiting for response.

Pega
September 6, 2019 - 11:00am

We dont support HiveQL. However if you have your data in the form of parquet files in HDFS , you can use the HDFS Dataset to map and access the data. Please refer to the links below for more information,

https://community.pega.com/knowledgebase/release-note/hdfs-data-sets-support-parquet-files

https://community1.pega.com/sites/pdn.pega.com/files/help_v81/procomhelpmain.htm#rule-/rule-decision-/rule-decision-dataset/dataset-hdfs-creating-tsk.htm