Extract Tables from PDF via C++

Extract table from PDF document. Use Aspose.PDF for C++ to modify PDF files programmatically

How to extracting Tables from PDF document Using C++ Library

In order to extract table from PDF, we’ll use Aspose.PDF for C++ API which is a feature-rich, powerful and easy to use document manipulation API for cpp platform. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console.

Package Manager Console

PM > Install-Package Aspose.PDF.Cpp

Extract Tables from PDF via C++


You need Aspose.PDF for C++ to try the code in your environment.

  1. Load the PDF with an instance of Document.
  2. Create TableAbsorber object to find tables.
  3. Visit first page with absorber.
  4. Get first table on the page.
  5. Remove the table. Save the file.

Extract Tables from PDF - C++


auto document = MakeObject<Document>(_dataDir + u"the_worlds_cities_in_2018_data_booklet 7.pdf");
    for (auto page : document->get_Pages())
    {
        auto absorber = MakeObject<Aspose::Pdf::Text::TableAbsorber>();
        absorber->Visit(page);
        for (auto table : absorber->get_TableList())
        {
            for (auto row : table->get_RowList())
            {
                for (auto cell : row->get_CellList())
                {
                    auto textfragment = MakeObject<TextFragment>();
                    auto textFragmentCollection = cell->get_TextFragments();
                    for (auto fragment : textFragmentCollection)
                    {
                        String txt;
                        for (auto seg : fragment->get_Segments())
                        {
                            txt += seg->get_Text();
                        }
                        Console::WriteLine(txt);
                    }
                }
            }
        }
    }