Extract Tables from PDF using C#

Extract table from PDF document. Use Aspose.PDF for .NET to modify PDF files programmatically

C# Java C++ Python

How to extracting Tables from PDF document Using .NET Library

In order to extract table, we’ll use Aspose.PDF for .NET API which is a feature-rich, powerful and easy to use document manipulation API for net platform. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console.

Package Manager Console

PM > Install-Package Aspose.PDF

Extract Tables from PDF using C#

You need Aspose.PDF for .NET to try the code in your environment.

Import the Necessary Libraries
Load the PDF Document
Initialize the TableAbsorber and iterate over pages
Extract table content
Save extracted data (optional)

Extract Tables from PDF - C#

var pdfDocument = new Aspose.Pdf.Document("sample.pdf");
foreach (var page in pdfDocument.Pages)
{
    var absorber = new Aspose.Pdf.Text.TableAbsorber();
    absorber.Visit(page);
    foreach (var table in absorber.TableList)
    {
        foreach (var row in table.RowList)
        {
            foreach (var cell in row.CellList)
            {
                var textfragment = new Aspose.Pdf.Text.TextFragment();
                var textFragmentCollection = cell.TextFragments;
                foreach (var fragment in textFragmentCollection)
                {
                    string txt = "";
                    foreach (var seg in fragment.Segments)
                    {
                        txt += seg.Text;
                    }
                    Console.WriteLine(txt);
                }
            }
        }
    }
}