How to Extract Table from HTML

The ability to extract tables from HTML is important for various applications such as web scraping and content analysis. Aspose.HTML for .NET is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to extract tables from HTML documents.

First, make sure you have Aspose.HTML for .NET installed in your project. The installation process of this library is quite simple. Open the NuGet package manager, search for Aspose.HTML, and install. You may also use the following command from the Package Manager Console:


Install Aspose.HTML for .NET

Install-Package Aspose.HTML



Extract HTML Tables using C#

Aspose.HTML for .NET API provides a powerful toolset to analyze and collect information from HTML documents. You can extract HTML tables with a few lines of C# code. The following example shows how to find all the <table> elements in an HTML document, create separate HTML files for each table, and save them in the output directory. Each output HTML file contains only one table from the source HTML document.


C# code to extract tables from HTML

using Aspose.Html;
using System.IO;
using System.Collections.Generic;
...

    // Prepare a path to a source HTML file
    string documentPath = Path.Combine(DataDir, "tables.html");

    // Create an instance of an HTML document
    using (var document = new HTMLDocument(documentPath))
    {
        var tables = document.GetElementsByTagName("table");
        var result = new List<Dictionary<string, string>>();
        var i = 0;
        foreach (var table in tables)
        {
            // Save table to new html document
            var newFileName = "table" + i + ".htm";
            var newDoc = new HTMLDocument(table.OuterHTML, Path.Combine(OutputDir, newFileName));
            newDoc.Save(Path.Combine(OutputDir, newFileName));
            i++;
        }
    }



Steps to Extract Tables from HTML

  1. Use the HTMLDocument() constructor to initialize an HTML document. Pass the path of the source HTML file as a parameter to the constructor.
  2. Use the GetElementsByTagName("table") method to collect all <table> elements. The method returns a list of the HTML document’s <table> elements.
  3. Start a loop to iterate over each table element:
    • Create a new file name for the HTML table file.
    • Use the HTMLDocument(content, baseUri) constructor to create a new instance of an HTML document using the OuterHTML property of the table element and a new file name.
    • Save the newly created HTML document to the output directory using the Save() method.

To learn more about Aspose.HTML API, please visit our documentation guide. Aspose.HTML for .NET is an advanced HTML parsing library that allows you to create, edit, and convert HTML, XHTML, MD, EPUB, and MHTML files. The Data Extraction documentation section describes how to automatically inspect, collect, and extract data from web pages using Aspose.HTML for .NET. In the articles in this section, you’ll learn how to navigate an HTML document and perform detailed inspection of its elements, save a website or file from a URL, extract different types of images from websites, and more.



HTML Table Generator – Online App

Aspose.HTML offers the HTML Table Generator is an online application for creating tables with customizable options. It’s free and clear to use. Just fill in all required options and get a result! The HTML Table Generator automatically creates the HTML table code. This tool was designed to let you get a required HTML table and put it online as quickly as possible.