Python API to Process HTML Files

Create, edit, extract data, merge and convert HTML pages to PDF, DOCX, XPS, Images and other formats.

Aspose.HTML for Python via .NET is an advanced API for HTML processing that allows for a wide range of management and manipulation tasks in cross-platform applications. The API can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats. Aspose.HTML for Python via .NET can be used to build 32-bit or 64-bit Python applications. Moreover, the HTML Document Object Model is integrated with embedded formats and specifications such as CSS, HTML Canvas, SVG, XPath, and JavaScript out-of-the-box, which extend the manipulation functional and rendering quality. Use Aspose.HTML for Python via .NET API to develop high-level, platform-independent software in Python!

Advanced Python API Features

new Wide range of conversions between formats

Create HTML from scratch

Load HTML from a file, stream or URL

Add, replace or remove nodes

Extract data from HTML documents

Load EPUB and MHTML file formats

Convert HTML to other file formats

Render multiple documents at once

Implement Markdown to HTML converter

Apply header and footer during HTML to PDF conversion

Navigate HTML using XPath Query or CSS Selector

Extract Data from the Web

Merge HTML, MHTML, EPUB, and MD files

Convert HTML to PDF

Python API Features in Documentation

Aspose.HTML for Python via .NET is a class library for working with real-world HTML. You can see the full list of Aspose.HTML features in our documentation. Using the Python Aspose.HTML library in your project allows you to do a wide range of tasks with HTML-based documents.

Convert HTML in Python

Aspose.HTML for Python via .NET API is a powerful solution for parsing and processing HTML documents using Python. With just a few lines of code, you can easily convert HTML, MHTML, EPUB, Markdown, and SVG to other popular formats within your Python applications. The conversion process is simple and reliable, making Aspose.HTML for Python via .NET API an excellent choice for your needs.

Convert HTML to PDF – Python code example



from aspose.html import *
from aspose.html.converters import *
from aspose.html.saving import *

# Load an HTML document to be converted
document = HTMLDocument("document.html")

# Create an instance of the PdfSaveOptions class
options = PdfSaveOptions()

# Convert HTML to PDF
Converter.convert_html(document, options, "output.pdf")

Aspose.HTML provides free online Converters for converting HTML-based documents to PDF, XPS, DOCX, JPG, PNG, BMP, TIFF, GIF, and other formats.

Navigate HTML

Aspose.HTML for Python via .NET provides a comprehensive API for effectively navigating and manipulating HTML documents within your Python applications. It allows you to seamlessly parse and traverse HTML content, providing detailed inspection and editing of HTML elements.

Navigate HTML – Python code example



from aspose.html import *

# Prepare HTML code
html_code = "<span>Hello</span> <span>World!</span>"

# Initialize a document from the prepared code
with HTMLDocument(html_code, ".") as document:
    # Get the reference to the first child (first <span>) of the body
    element = document.body.first_child
    print(element.text_content)  # output: Hello

    # Get the reference to the whitespace between html elements
    element = element.next_sibling
    print(element.text_content)  # output: ' '

    # Get the reference to the second <span> element
    element = element.next_sibling
    print(element.text_content)  # output: World!

Data Extraction

Aspose.HTML for Python via .NET is completely based on the W3C specification and supports XPath and CSS Selector queries. With it, you can quickly inspect the content of any HTML document and create your own data extraction solution.

Data Extraction – Python code example



from aspose.html import *

# Create an instance of HTML document with a web address
document = html.HTMLDocument("https://www.wikipedia.org/")

# Query all h2 elements
elements = document.query_selector_all("h2")

# Check if any h2 elements are found
if elements.length > 0:
    # Get the first h2 element
    first_heading = elements[0]
    # Get the text content of the h2 element
    content = first_heading.text_content.strip() if first_heading.text_content else ""
    # Print the text of the first h2 element
    print("Text of the first heading:")
    print(content)
else:
    print("No h2 elements found on the page")

Aspose.HTML for Python via .NET makes navigating and manipulating HTML documents simple and efficient, providing a versatile solution for developers who need to work with HTML content programmatically.

Support and Learning Resources

Why Aspose.HTML for Python via .NET?
Customers List
Success Stories

Download Free Trial Pricing Information

Aspose.HTML offers individual HTML processing APIs for other popular development environments as listed below: