Extract Attachments from PDF via Java

Extract Attachments from PDF document. Use Aspose.PDF for Java to modify PDF files programmatically

How to Extract Attachments Using Java Library

In order to extract Attachments, we’ll use Aspose.PDF for Java API which is a feature-rich, powerful and easy to use conversion API for Java platform. You can download its latest version directly from Maven and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java AP</name>
    <url>https://releases.aspose.com/java/repo/</url>
</repository>

Dependency

<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>version of aspose-pdf API</version>
</dependency>

Extract Attachments from PDF Java


You need Aspose.PDF for Java to try the code in your environment.

  1. Get embedded files collection.
  2. Get count of the embedded files.
  3. Loop through the collection to get all the attachments.
  4. Check if parameter object contains the parameters.
  5. Get the Attachment and write to file or stream.

Extract Attachment from PDF document


    // Open document
    Document pdfDocument = new Document(_dataDir+"input.pdf");
    // Get particular embedded file
    FileSpecification fileSpecification = pdfDocument.getEmbeddedFiles().get_Item(1);
    // Get the file properties
    System.out.printf("Name: - " + fileSpecification.getName());
    System.out.printf("\nDescription: - " + fileSpecification.getDescription());
    System.out.printf("\nMime Type: - " + fileSpecification.getMIMEType());
    // Get attachment form PDF file
    try {
    InputStream input = fileSpecification.getContents();
    File file = new File(fileSpecification.getName());
    // Create path for file from pdf
    file.getParentFile().mkdirs();
    // Create and extract file from pdf
    java.io.FileOutputStream output = new java.io.FileOutputStream(fileSpecification.getName(), true);
    byte[] buffer = new byte[4096];
    int n = 0;
    while (-1 != (n = input.read(buffer)))
        output.write(buffer, 0, n);
    // Close InputStream object
    input.close();
    output.close();
    } catch (IOException e) {
    e.printStackTrace();
    }
    // Close Document object
    pdfDocument.dispose();