Extract Attachments from PDF via Java

How to Extract Attachments from PDF programmatically with Java

How to Extract Attachments Using Java Library

In order to extract Attachments, we’ll use Aspose.PDF for Java API which is a feature-rich, powerful and easy to use conversion API for Java platform. You can download its latest version directly from Maven and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java AP</name>
    <url>https://releases.aspose.com/java/repo/</url>
</repository>

Dependency

<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>version of aspose-pdf API</version>
</dependency>

Extract Attachments from PDF Java


You need Aspose.PDF for Java to try the code in your environment.

  1. Get embedded files collection.
  2. Get count of the embedded files.
  3. Loop through the collection to get all the attachments.
  4. Check if parameter object contains the parameters.
  5. Get the Attachment and write to file or stream.

Extract Attachment from PDF document


// Open document
Document pdfDocument = new Document(_dataDir+"input.pdf");
// Get particular embedded file
FileSpecification fileSpecification = pdfDocument.getEmbeddedFiles().get_Item(1);
// Get the file properties
System.out.printf("Name: - " + fileSpecification.getName());
System.out.printf("\nDescription: - " + fileSpecification.getDescription());
System.out.printf("\nMime Type: - " + fileSpecification.getMIMEType());
// Get attachment form PDF file
try {
    InputStream input = fileSpecification.getContents();
    File file = new File(fileSpecification.getName());
    // Create path for file from pdf
    file.getParentFile().mkdirs();
    // Create and extract file from pdf
    java.io.FileOutputStream output = 
        new java.io.FileOutputStream(
            fileSpecification.getName(), 
            true);
    byte[] buffer = new byte[4096];
    int n = 0;
    while (-1 != (n = input.read(buffer)))
        output.write(buffer, 0, n);
    // Close InputStream object
    input.close();
    output.close();
} 
catch (IOException e) {
    e.printStackTrace();
}
// Close Document object
pdfDocument.dispose();