How to convert Microsoft Word (.docx) to HTML


We have a requirement to read from a temp file stored as an MS word document (.docx) and convert it in to HTML so that the rich text can be preserved while editing in CKEditor. I have looked into various ways to accomplish this. From what I can tell, Pega has similar functionality when uploading a word document template for correspondence rules but the function would seem to be private and I cant look into the code.

There are two main java libraries that are commonly used for docx to html, Docx4j and Apache POI. Part of the Apache POI library is within Pega and but we are missing XWPFConverter needed for the docx to html methods. Docx4j has a method for docx to html but it looks like it is creating a blank html as the method for loading a word document is dependent on a File input and we have to use PRFile for the pega temp folder.

Is there anyway we can accomplish this docx to html conversion without having to import a new Java Library?


Keep up to date on this post and subscribe to comments

September 23, 2019 - 6:01am

I could not find an alternative to do this without importing Java Library? Is there any specific reason due to which you don't want to import a library.