Extract Tables from PDF using C++

Extract table from PDF document. Use Aspose.PDF for C++ to modify PDF files programmatically

How to extracting Tables from PDF document Using C++ Library

In order to extract table from PDF, we’ll use Aspose.PDF for C++ API which is a feature-rich, powerful and easy to use document manipulation API for cpp platform. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console.

Package Manager Console

PM > Install-Package Aspose.PDF.Cpp

Extract Tables from PDF using C++


You need Aspose.PDF for C++ to try the code in your environment.

  1. Import the Necessary Libraries
  2. Load the PDF Document
  3. Initialize the TableAbsorber and iterate over pages
  4. Extract table content
  5. Save extracted data (optional)

Extract Tables from PDF - C++


auto document = MakeObject<Document>(_dataDir + u"the_worlds_cities_in_2018_data_booklet 7.pdf");
    for (auto page : document->get_Pages())
    {
        auto absorber = MakeObject<Aspose::Pdf::Text::TableAbsorber>();
        absorber->Visit(page);
        for (auto table : absorber->get_TableList())
        {
            for (auto row : table->get_RowList())
            {
                for (auto cell : row->get_CellList())
                {
                    auto textfragment = MakeObject<TextFragment>();
                    auto textFragmentCollection = cell->get_TextFragments();
                    for (auto fragment : textFragmentCollection)
                    {
                        String txt;
                        for (auto seg : fragment->get_Segments())
                        {
                            txt += seg->get_Text();
                        }
                        Console::WriteLine(txt);
                    }
                }
            }
        }
    }