How to Extract Table from Website

The ability to extract tables from HTML is important for various applications such as web scraping and content analysis. Aspose.HTML for .NET is a robust library that simplifies this process by offering developers a set of tools to navigate and gather information from HTML documents seamlessly. Let’s explore how to extract tables from website.

First, make sure you have Aspose.HTML for .NET installed in your project. The installation process of this library is quite simple. Open the NuGet package manager, search for Aspose.HTML, and install. You may also use the following command from the Package Manager Console:


Install Aspose.HTML for .NET

Install-Package Aspose.HTML



Extract Table from Website using C#

You can extract tables from website with a few lines of C# code. The following example shows how to find all the <table> elements in an HTML document, create separate HTML files for each table, and save them in the output directory.


C# code to extract tables from website

using Aspose.Html;
using System.IO;
using System.Linq;
using System.Collections.Generic;
...

    // Open a document you want to download tables from
    using var document = new HTMLDocument("https://docs.aspose.com/html/net/edit-html-document/");
    { 
        // Check if there are any table elements in the document
        var tables = document.GetElementsByTagName("table");

        if (tables.Any())
        {
            var result = new List<Dictionary<string, string>>();
            var i = 0;
            foreach (var table in tables)
            {
                // Save table to new html document
                var newFileName = "table" + i + ".htm";
                var newDoc = new HTMLDocument(table.OuterHTML, Path.Combine(OutputDir, newFileName));
                newDoc.Save(Path.Combine(OutputDir, newFileName));
                i++;
            }
        }
        else
        {
            // Handle the case where no tables are found
            Console.WriteLine("No tables found in the document.");
        }
    }



Steps to Extract Tables from Website

This C# example shows how to extract tables from website and save each table as a separate HTML file. It includes steps to handle scenarios where tables are both present and absent in the document.

  1. Use the HTMLDocument() constructor to initialize an HTML document. The constructor takes as a parameter the URL of the website from which you want to download tables.
  2. Use the GetElementsByTagName("table") method to retrieve all <table> elements from the HTML document. Store a collection of table elements in the tables variable.
  3. Check if the document contains any table elements. Use the Any() method to determine if the tables collection contains any elements. If tables are found:
    • Create a new file name for the HTML table file.
    • Use the HTMLDocument(content, baseUri) constructor to create a new instance of an HTML document using the OuterHTML property of the table element and a new file name.
    • Save the newly created HTML document to the output directory using the Save() method.
  4. If the document does not contain tables, print a message to the console indicating that no tables were found.

To learn more about Aspose.HTML API, please visit our documentation guide. Aspose.HTML for .NET is an advanced HTML parsing library that allows you to create, edit, and convert HTML, XHTML, MD, EPUB, and MHTML files. The Data Extraction documentation section describes how to automatically inspect, collect, and extract data from web pages using Aspose.HTML for .NET. In the articles in this section, you’ll learn how to navigate an HTML document and perform detailed inspection of its elements, save a website or file from a URL, extract different types of images from websites, and more.



HTML Table Generator – Online App

Aspose.HTML offers the HTML Table Generator is an online application for creating tables with customizable options. It’s free and clear to use. Just fill in all required options and get a result! The HTML Table Generator automatically creates the HTML table code. This tool was designed to let you get a required HTML table and put it online as quickly as possible.