Extract Tables from PDF via Java

Extract table from PDF document. Use Aspose.PDF for Java to modify PDF files programmatically

How to extracting Tables from PDF document Using Java Library

In order to extract table, we’ll use Aspose.PDF for Java API which is a feature-rich, powerful and easy to use conversion API for Java platform. You can download its latest version directly from Maven and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java AP</name>
    <url>https://releases.aspose.com/java/repo/</url>
</repository>

Dependency

<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>version of aspose-pdf API</version>
</dependency>

Extract Tables from PDF via Java


You need Aspose.PDF for Java to try the code in your environment.

  1. Load the PDF with an instance of Document.
  2. Create TableAbsorber object to find tables.
  3. Visit first page with absorber.
  4. Get first table on the page.
  5. Remove the table. Save the file.

Extract Tables from PDF - Java


    Document pdfDocument = new Document(_dataDir + "the_worlds_cities_in_2018_data_booklet 7.pdf");
    for(Page page : pdfDocument.getPages())
    {
        TableAbsorber absorber = new TableAbsorber();
        absorber.visit(page);
        for (AbsorbedTable table : absorber.getTableList())
        {
            for (AbsorbedRow row : table.getRowList())
            {
                for (AbsorbedCell cell : row.getCellList())
                {
                    TextFragmentCollection textFragmentCollection = cell.getTextFragments();
                    for (TextFragment fragment : textFragmentCollection)
                    {
                        String txt = "";
                        for (TextSegment seg : fragment.getSegments())
                            txt += seg.getText();
                        System.out.println(txt);
                    }
                }
            }
        }
    }