Question

How to extract the content from an email attachment using OCR component?

Hi,

We have a requirement to extract the content from email attachment which is a PDF and create a case based on the content. Do we need to configure any additional configurations in the Email channel?

Comments

Keep up to date on this post and subscribe to comments

December 13, 2019 - 2:50am

Hi Santosh, email channel has configuration to enable attachment analysis. Attachments can be analyzed with and without OCR. 

With OCR - You need to install ABBY core processor on the application server. More info here - https://community.pega.com/knowledgebase/articles/conversational-channels/installing-pega-ocr-component

 

Without OCR - When you do not have OCR installed, attachments (pdf, doc, xls, etc) are analyzed using Java libraries which does pretty decent job of text extraction and then passes that text to NLP for intelligent routing. More info here - https://community.pega.com/knowledgebase/release-note/support-extracting-data-file-attachments-during-email-triage

Attachment analysis of email channel in depth - 

https://community.pega.com/sites/default/files/help_v74/procomhelpmain.htm#mcp/tasks/mcp-enabling-analysis-attached-file-during-email-triage-tsk.htm

(You can find similar document for all versions of Pega platform)