Extract PDF Metadata via Java

Extract metadata from PDF document. Use Aspose.PDF for Java to modify PDF files programmatically

How to Extract PDF Metadata Using Java

Extract metadata from PDF using Aspose.PDF for Java. Accessing a document’s metadata means getting information about that file, such as its title, author, when it was created, and specific keywords. Extract metadata, helps organize a large collection of PDF more effectively. The data extracted from metadata improves how you can search for files. Users can quickly locate specific documents by using keywords or details found in the extracted metadata. Extracting metadata gives valuable insights into what a file contains. It might offer a brief summary of key details about the file, making it easier to understand what the document is about without having to open it. Extract metadata helps ensure a document is authentic. You can check details like the author’s name when it was created, or its modification history. This verification is crucial for confirming a PDF reliability. By offering concise details about the content of a PDF, the extracted metadata makes the user experience much better. It helps users easily identify and work with documents. Overall, extracting PDF metadata gives many advantages, such as more efficient document management, improved search options, compliance with standards, and an overall enhanced user experience. Extract metadata from PDF via Aspose, and solve all the necessary tasks in the work with data. To Extract Metadata from PDF files, we’ll use Aspose.PDF for Java API, which is a feature-rich, powerful, and easy-to-use conversion API for the Java platform. You can download its latest version directly from Maven and install it within your Maven-based project by adding the following configurations to the pom.xml.

Repository

<repository>
    <id>AsposeJavaAPI</id>
    <name>Aspose Java AP</name>
    <url>https://releases.aspose.com/java/repo/</url>
</repository>

Dependency

<dependency>
<groupId>com.aspose</groupId>
<artifactId>aspose-pdf</artifactId>
<version>version of aspose-pdf API</version>
</dependency>

Extract PDF Metadata via Java


You need Aspose.PDF for Java to try the code in your environment.

  1. Load the PDF with an instance of Document.
  2. Get DocumentInfo using Document.Info property.
  3. Access & display different Document.Info properties.

The provided Java code snippet shows how to extract metadata from PDF by Aspose.PDF library. It opens a PDF file named ‘GetFileInfo.pdf’ located in the directory specified by the variable ‘DIR_INPUT_METADATA’. The code retrieves various details from the document using the ‘info’ function. It displays specific metadata information from the PDF, such as the author’s name, creation date, keywords, modification date, subject, and title. The code uses the ‘print’ function to show this information. This code snippet is a simplified example of how you might use a Aspose.PDF library or framework to extract metadata from PDF file.

Extract Metadata of PDF - Java

This sample code shows how to extract metadata informations of the PDF file

Input file:

File not added

Output format:

Output file:


    Document pdfDocument = new Document(_dataDir + "sample.pdf");
    // Get document information
    DocumentInfo docInfo = pdfDocument.getInfo();
    // Show document information
    System.out.println("Author: " + docInfo.getAuthor());
    System.out.println("Creation Date: " + docInfo.getCreationDate());
    System.out.println("Keywords: " + docInfo.getKeywords());
    System.out.println("Modify Date: " + docInfo.getModDate());
    System.out.println("Subject: " + docInfo.getSubject());
    System.out.println("Title: " + docInfo.getTitle());