As a Python developer, you may need to add a feature to your application that allows you to convert PDF files to MHTML (Web archive format) or HtmlFixed (HTML format with absolutely positioned elements). Aspose.Total for Python via .NET API can help you automate this process. It is a comprehensive package of various APIs that can handle different file formats.
Aspose.Words for Python via .NET API, which is part of the Aspose.Total for Python via .NET package, can be used to add the PDF to MHTML conversion feature. If the PDF file is simple, it can be done with just two lines of code. You can load the PDF file and call the save method with the appropriate file path and the SaveFormat enumeration as MHTML or HTML_FIXED. However, if you need to restore the document model as close to the original as possible, you will need to save some extra information within the resultant document, known as round-trip information.
How to Convert PDF to MHTML in Python
- Load source PDF file using Document class
- Create the instance of HtmlSaveOptions
- Set the export_roundtrip_information as True
- Specify the SaveFormat as MHTML
- Call the
savemethod while specifying output file path & SaveFormat as parameters. So your PDF file is converted to MHTML at the specified path
Conversion Requirements
- For PDF to MHTML or HtmlFixed format conversion, Python 3.5 or later is required
- Reference APIs within the project directly from PyPI ( Aspose.Words )
- Or use the following pip commands
pip install aspose.words - Moreover, Microsoft Windows or Linux based OS (see more for Words ) and for Linux check additional requirements for gcc and libpython and follow step by step instructions INSTALL
Save PDF To MHTML in Python - Simple
PDF To MHTML Conversion in Python
Key Use Cases
Web Archive Creation
Convert PDF files into MHTML for browser-based storage and viewing.Portable Document Publishing
Share document content in a self-contained web-friendly format.Content Preservation
Retain visual and textual information in an archive suited to web workflows.System Interoperability
Use MHTML output where document exchange must align with browser-compatible standards.
Automation Scenarios
Automated Web Conversion Pipelines
Python scripts can turn PDFs into MHTML files for digital publishing systems.Archival Distribution Workflows
Converted outputs can be delivered to repositories that manage web archive content.Batch Document Publishing
Large sets of PDFs can be transformed into portable web files without manual intervention.Dynamic Content Exporting
Systems can generate MHTML versions of documents on demand for sharing or review.