How to convert Microsoft Word (.docx) to HTML
We have a requirement to read from a temp file stored as an MS word document (.docx) and convert it in to HTML so that the rich text can be preserved while editing in CKEditor. I have looked into various ways to accomplish this. From what I can tell, Pega has similar functionality when uploading a word document template for correspondence rules but the function would seem to be private and I cant look into the code.
There are two main java libraries that are commonly used for docx to html, Docx4j and Apache POI. Part of the Apache POI library is within Pega and but we are missing XWPFConverter needed for the docx to html methods. Docx4j has a method for docx to html but it looks like it is creating a blank html as the method for loading a word document is dependent on a File input and we have to use PRFile for the pega temp folder.
Is there anyway we can accomplish this docx to html conversion without having to import a new Java Library?
Keep up to date on this post and subscribe to comments
- Converting HTML to Word (docx) format
- Is there a feature to convert the HTML stream into Word document Byte stream
- Need help in converting docx to pdf using docx4j api
- Word Template-Not able to get continuous footer & page number for Dynamic Data in word.DOCX
- Is there a way to create a Word Document using docx4j (third party API) in Pega?