C# HTML Parser – Aspose.HTML for. NET API to Parse HTML Files

Create, edit, extract data, merge and convert HTML pages to PDF, DOCX, XPS, Images and other formats.

Aspose.HTML for .NET is an advanced HTML processing API to perform a wide range of management and manipulation tasks within cross-platform applications. API is designed to create, modify, extract data, convert and render HTML documents without any external software. Also, it supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and rendering to PDF, DOCX, XPS and Image file formats. Aspose.HTML for .NET is written completely in C# and can be used to build any type of 32-bit or 64-bit .NET application including ASP.NET, WCF, WinForms & .NET Core. Moreover, the HTML Document Object Model is integrated with embedded formats and specifications such as CSS, HTML Canvas, SVG, XPath and JavaScript out-of-the-box that extend the manipulation functional and rendering quality.

Advanced .NET HTML API Features

new Linux support without dependency on System.Drawing.Common

new Check Website Accessibility

Create HTML from scratch

Load existing HTML from a file, stream or URL

Add, replace or remove nodes

Edit MHTML

Edit Markdown

Generate HTML Code

Implement templates using template merger

Wide range of conversions between formats

Implement Markdown to HTML converter

Convert HTML to PDF

Convert HTML to Image file formats

Merge HTML, MHTML, EPUB, and MD files

Render HTML Canvas 2D to PDF

Navigate HTML using XPath Query or CSS Selector

Extract data from HTML documents

Extract Data from the Web

Pixel Calculator

Managing images in HTML

Create HTML Table

Render multiple documents at once

Apply header and footer during HTML to PDF conversion

Implement W3C specifications

API Features in Documentation

Aspose.HTML for .NET is a class library for working with real-world HTML. You can see the full list of Aspose.HTML features in our documentation. Using Aspose.HTML C# library in your project allows you to perform the following tasks:

Create or load HTML-based documents from a file, URL, string, or stream.
Convert documents between popular formats.
Create custom message handlers to do a specific task.
Navigate HTML documents using XPath Query or CSS Selector.
Edit HTML files by inserting new nodes, removing, or editing the content of existing nodes.
Render documents with high quality,
and more.

Convert HTML to PDF, Image and Other Formats in C#

C# API allows with just a few lines of code to implement HTML to PDF, HTML to Image or any other conversion for your .NET applications. The conversion process is simple and reliable, thus making Aspose.HTML for .NET API a perfect choice.

Convert HTML to PDF – C#



using Aspose.HTML;
using Aspose.HTML.Saving;
using Aspose.HTML.Converters;
...

    // Load an HTML file to be converted
    using var document = new HTMLDocument("input.html")

    // Create an instance of the PdfSaveOptions class
    var pdfSaveOptions = new PdfSaveOptions();

    // Convert HTML to PDF
    Converter.ConvertHTML(document, pdfSaveOptions, "output.pdf");

You can try online HTML Converter.

You can also convert HTML, XHTML, MHTML, Markdown, EPUB, or SVG into many other file formats including few listed below:

Merge HTML, MHTML, EPUB and MD files

Aspose.HTML for .NET API makes the files' merging process easier for developers: loads files using HTMLDocument class; creates an instance of Renderer and a required output device; uses the Render() method to merge all HTML documents.
Moreover, you can merge files in real-time! Combine HTML, MHTML, Markdown, and EPUB, into PDF, XPS, DOCX, TIFF and many other file formats:

HTML Merger

MHTML Merger

EPUB Merger

MD Merger

Editing HTML Files

Aspose.HTML for .NET allows you to create and edit HTML documents using a Document Object Model (DOM). The DOM is a programming interface for HTML documents that represents the document (as nodes and objects) as a node tree, where each node represents part of the document. Aspose.HTML for .NET API lets you connect to the page and can change the document structure, style, and content. You can modify the document by inserting new nodes and removing or editing existing nodes' content.

The .NET HTML API assists developers to read, modify, navigate and edit (X)HTML documents. Some file editing functions that the Aspose.HTML for .NET API can perform are the following:

navigate HTML documents by using various methods, such as, element traversal, document traversal, XPath queries, and CSS selector queries,
remove and replace HTML nodes,
extract and edit CSS from HTML,
configure a document sandbox and more.

You can easily edit documents, generate HTML code and scrape data from the Web online or programmatically using the following tools:

Edit Documents

HTML Generators

HTML Navigation

Markdown Support

Markdown is a markup language with a plain-text-formatting syntax. Markdown is often used as a format for documentation and readme files since it allows writing in an easy-to-read and easy-to-write style. Aspose.HTML provides a powerful and flexible Markdown Converter that can convert in both directions from Markdown to HTML and from HTML to Markdown. Moreover, the converter API has a set of predefined rules, so you can convert HTML to Markdown using the authentic Markdown syntax, GitLab Flavored Markdown modification or even configure the rules for your needs.

Convert HTML to Markdown – C#



using Aspose.Html;
using Aspose.HTML.Saving;
...

	// Load an HTML file
	using var document = new HTMLDocument("document.html");

	// Convert HTML to Markdown using a set of features supported by GitLab Flavored Markdown
	document.Save("output.md", MarkdownSaveOptions.Git);

The reverse conversion is that simple! Using the Aspose.HTML class library in your C# application, you can easily convert Markdown into an HTML file with just one line of code!

Convert Markdown to HTML – C#



using Aspose.Html.Converters;
...

	// Convert Markdown to HTML
	Converter.ConvertMarkdown("document.md", "output.html");

Try online Markdown Converter! You can convert Markdown to PDF, XPS, DOCX, JPG, PNG, BMP, TIFF, GIF, and MHTML. Upload, transform your documents and get results in a few seconds. You don't need any additional software.

Aspose.HTML for .NET library provides Markdown parsing API for the C# platform. You can сreate, edit, save, merge, convert MD files to other file formats, and add links, lists, code blocks, images and other elements into Markdown files by following the links:

Electronic Books and Web Archives

Aspose.HTML for .NET is capable of loading EPUB and MHTML files to perform various operations including the conversion to fixed-layout and raster image formats.

Convert EPUB to PDF – C#



using Aspose.Html.Converters;
using Aspose.Html.Saving;
...

	// Open an existing EPUB file for reading
    using var stream = File.OpenRead("input.epub");

    // Create an instance of PdfSaveOptions
    var options = new PdfSaveOptions();

    // Call the ConvertEPUB method to convert EPUB to PDF
    Converter.ConvertEPUB(stream, options, "output.pdf");

Convert MHTML to PDF – C#



using Aspose.Html.Converters;
using Aspose.Html.Saving;
...

	 // Open an existing MHTML file for reading
     using var stream = File.OpenRead("input.mht");

     // Create an instance of PdfSaveOptions
     var options = new PdfSaveOptions();

     // Call the ConvertMHTML method to convert MHTML to PDF
     Converter.ConvertMHTML(stream, options, output.pdf);

You can try online MHTML Converter and online EPUB Converter. Our browser-based converting tools work from all platforms, including Windows, Linux, Mac OS, Android and iOS. Converters are compatible with all PC devices, smartphones and tablets.

Data Extraction

Web scraping, also well known as web harvesting, web data extraction or web crawling, is a technique to extract data from a website. Aspose.HTML doesn't support a Data Extraction module out-of-the-box. However, using Aspose.HTML API that is entirely based on W3C specification and supports XPath and CSS Selector queries you can easily inspect the content of any HTML document and create your own Data Extraction solution.

Simple Web Data Extraction – C#



using Aspose.Html;
...

    // Create an instance of the HTML document with a website as a parameter
    using var document = new Aspose.Html.HTMLDocument("https://en.wikipedia.org/wiki/Aspose_API");

    // Get all anchor-elements
    var elements = document.QuerySelectorAll("a");

    // Dump the anchor-element data to the console
    elements.Cast<HTMLAnchorElement>().ToList().ForEach(x =>
        {
            System.Console.WriteLine("[Href]: " + x.Href);
            System.Console.WriteLine("[Content]: " + x.TextContent);
        });

Aspose.HTML offers free online Data Extraction Apps that are a way to get data from websites. Our Apps are safe, work on any platform and do not require any software installation. Data Extraction can be used for image extracting, getting keywords from a webpage, etc. They are easy and clear to use, yet forceful and reliable.