Aspose.HTML for Java is a powerful HTML manipulation API that enables developers to create, edit, parse, and convert HTML documents within Java applications. The API allows you to add, delete, and replace nodes, extract CSS styling, and navigate through documents using XPath, CSS selectors, or DOM methods. It supports loading EPUB and MHTML formats and provides JavaScript DOM manipulation capabilities.
With Aspose.HTML for Java, you can convert HTML documents to PDF, XPS, DOCX, and raster image formats (JPEG, PNG, BMP, GIF, TIFF) without requiring any external software or dependencies. The API also provides PDF encryption and customizable page setup options.

Advanced Java HTML Processing API Features

 

API Features in Documentation

You can see the full list of Aspose.HTML features in our documentation. Using Aspose.HTML for Java library in your project allows you to perform the following tasks:


Convert HTML to PDF and XPS Format

API supports the rendering of HTML to a variety of popular formats including PDF, XPS, DOCX, MHTML, Markdown, and image formats. Developers can customize by configuring PageSetup aspects for the resultant fixed-layout formats including page numbers to be rendered, resultant page size or setting the JPEG compression for the embedded images.

Render HTML as fixed-layout formats – Java


// Load HTML document from file
HTMLDocument htmdoc = new HTMLDocument(dir + "template.html");

// Render HTML to PDF & XPS
HtmlRenderer renderer = new HtmlRenderer();

renderer.render(new PdfDevice(new PdfRenderingOptions(), dir + "output.pdf"), htmdoc);
renderer.render(new XpsDevice(new XpsRenderingOptions(), dir + "output.xps"), htmdoc);

Conversion to Raster Images

Aspose.HTML for Java features a high-fidelity rendering engine that converts HTML pages to the most commonly used raster image formats including TIFF, GIF, BMP, PNG, and JPEG without requiring any additional software or tools.

Convert HTML to PNG using Aspose.HTML for Java


// Initialize an HTML document from a file
HTMLDocument document = new HTMLDocument("document.html");

// Initialize ImageSaveOptions
ImageSaveOptions options = new ImageSaveOptions(ImageFormat.Png);

// Convert HTML to PNG
Converter.convertHTML(document, options, "document-output.png");

You can try online HTML Converter.

You can also convert HTML, XHTML, MHTML, Markdown, EPUB, or SVG into many other file formats including few listed below:


Manipulating EPUB and MHTML files

The library is capable of loading EPUB and MHTML files to perform various operations including the conversion to fixed-layout and raster image formats.

Convert MHTML to PDF using Aspose.HTML for Java


// Open an existing MHTML file for reading
java.io.FileInputStream fileInputStream = new java.io.FileInputStream("sample.mht");

// Create an instance of the PdfSaveOptions class
PdfSaveOptions options = new PdfSaveOptions();

// Call the convertMHTML() method to convert MHTML to PDF
Converter.convertMHTML(fileInputStream, options, "sample-output.pdf");

HTML Nodes Navigation

The API supports navigation through HTML documents using XPath, CSS selectors, or DOM methods. You can easily insert, extract, remove, or replace nodes in the document tree.

Extract all anchor nodes from HTML document


// Instance creation of HTMLDocument and loading HTML from URL
HTMLDocument dct = new HTMLDocument("https://www.aspose.com");

// Get all anchor type nodes
NodeList nodelist = dct.getDocumentElement().querySelectorAll("a");

// Display anchor text & href values for all nodes
for (Node node : nodelist){

    HTMLAnchorElement anchor = (HTMLAnchorElement)node;
    System.out.println("Text: " + node.getTextContent() + " Href: " + anchor.getHref());
}

Configure Sandbox

The HTML API enables you to configure a document sandbox that affects the processing of HTML documents, that is; the CSS styles in some cases are dependent on screen size.

Disable scripts for HTML to PDF conversion using Java


// Prepare HTML code and save it to a file
String code = "Hello, World!!\n" +
        "\n";

try (java.io.FileWriter fileWriter = new java.io.FileWriter("sandboxing.html")) {
    fileWriter.write(code);
}

// Create an instance of the Configuration class
Configuration configuration = new Configuration();

// Mark 'scripts' as an untrusted resource
configuration.setSecurity(com.aspose.html.Sandbox.Scripts);

// Initialize an HTML document with specified configuration
HTMLDocument document = new HTMLDocument("sandboxing.html", configuration);

// Convert HTML to PDF
Converter.convertHTML(document, new PdfSaveOptions(), "sandboxing_out.pdf");

Frequently Asked Questions

1. What is Aspose.HTML for Java?

Aspose.HTML for Java is a class library that enables developers to manipulate and convert HTML documents within their Java applications without requiring external tools or software.

2. What problem does Aspose.HTML for Java actually solve at the API level?

Aspose.HTML for Java provides a programmable HTML processing engine that lets you load, parse, modify, render, and convert HTML documents without relying on a browser runtime. You interact directly with a structured DOM, rendering pipeline, and conversion layer via Java APIs, making the behavior deterministic and suitable for backend systems.

3. How is HTML parsed internally, and does it follow modern standards?

The parser is aligned with WHATWG and W3C specifications, meaning it handles malformed markup, implicit tags, and encoding rules according to modern web standards. The resulting document is exposed as a fully navigable DOM, which is important for tasks such as transformation, validation, and rendering.

4. Is it possible to extract structured data from HTML using the API?

Yes. Since the document is represented as a full DOM, you can query it using selectors or traversal APIs and extract specific elements, attributes, or text nodes. This is particularly useful when HTML serves as a data container rather than just a visual document.

5. Can the API work with formats other than HTML?

Yes, but not all formats are handled in the same way. Aspose.HTML for Java can load and process formats like XHTML, MHTML, SVG, EPUB, and Markdown, but they are not all treated as interchangeable HTML documents. Some of them require specific loading methods or are supported only for certain operations, such as conversion rather than full editing.
In practice, HTML remains the primary working format, while other formats are typically used as input sources or conversion targets within the same processing pipeline.




  

Support and Learning Resources

  
  

Aspose.HTML offers individual HTML processing APIs for other popular development environments as listed below: