Extract PDF Metadata via Python

Extract metadata from PDF document. Use Aspose.PDF for Python for .NET to modify PDF files programmatically

How to Extract PDF Metadata Using Python

Extract metadata from PDF using Aspose.PDF for Python. Accessing a document’s metadata means getting information about that file, such as its title, author, when it was created, and specific keywords. Extract metadata, helps organize a large collection of PDF more effectively. The data extracted from metadata improves how you can search for files. Users can quickly locate specific documents by using keywords or details found in the extracted metadata. Extracting metadata gives valuable insights into what a file contains. It might offer a brief summary of key details about the file, making it easier to understand what the document is about without having to open it. Extract metadata helps ensure a document is authentic. You can check details like the author’s name when it was created, or its modification history. This verification is crucial for confirming a PDF reliability. By offering concise details about the content of a PDF, the extracted metadata makes the user experience much better. It helps users easily identify and work with documents. Overall, extracting PDF metadata gives many advantages, such as more efficient document management, improved search options, compliance with standards, and an overall enhanced user experience. Extract metadata from PDF via Aspose, and solve all the necessary tasks in the work with data. In order to Extract Metadata from PDF files, we’ll use Aspose.PDF for .NET API, which is a feature-rich, powerful, and easy-to-use document manipulation API for .NET. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console.

Console

pip install aspose-pdf

Extract PDF Metadata via Python

To try the code in your environment, you need Aspose.PDF for .NET.

Load the PDF with an instance of Document.
Get DocumentInfo using Document.Info property.
Access & display different Document.Info properties.

The provided Python code snippet shows how to extract metadata from PDF by Aspose.PDF library. It opens a PDF file named ‘GetFileInfo.pdf’ located in the directory specified by the variable ‘DIR_INPUT_METADATA’. The code retrieves various details from the document using the ‘info’ function. It displays specific metadata information from the PDF, such as the author’s name, creation date, keywords, modification date, subject, and title. The code uses the ‘print’ function to show this information. This code snippet is a simplified example of how you might use a Aspose.PDF library or framework to extract metadata from PDF file.

Extract Metadata of PDF - Python

This sample code shows how to extract metadata informations of the PDF file

Input file:

Upload a file

File not added

Output format:

Output file:

import aspose.pdf as apdf

from os import path

input_file = path.join(self.data_dir, infile)
# Open document
document = apdf.Document(input_file)

# Get document information
doc_info = document.info
# Show document information
print("Author :", doc_info.author)
print("Creation Date :", doc_info.creation_date)
print("Keywords :", doc_info.keywords)
print("Modify Date :", doc_info.mod_date)
print("Subject :", doc_info.subject)
print("Title :", doc_info.title)