Search PDF using Python

Advanced PDF document search. Use Aspose.PDF for Python for .NET to modify PDF documents programmatically

How to Search PDF File Using Python

To search a PDF file, we’ll use Aspose.PDF for Python via .NET, a powerful and easy-to-use API. Open PyPI, install it, and search for aspose-pdf. Alternatively, run the command:

Console

pip install aspose-pdf

Search PDF File via Python


You need Aspose.PDF for Python via .NET to try the code in your environment.

  1. Load the PDF with an instance of Document.
  2. Create TextFragmentAbsorber Object with text to find as parameter.
  3. Get all extracted text fragments collection.
  4. Loop through each fragment to get all of its information.

Search PDF Files - Python

import aspose.pdf as apdf

from os import path
path_infile = path.join(self.data_dir, infile)

document = apdf.Document(path_infile)

# Create TextAbsorber object to find all instances of the input search phrase
textFragmentAbsorber = apdf.text.TextFragmentAbsorber("PDF")

# Accept the absorber for all the pages
document.pages.accept(textFragmentAbsorber)

# Loop through the fragments
for textFragment in textFragmentAbsorber.text_fragments:
    print(f"Text : {textFragment.text}" )
    print(f"Position : {textFragment.position}")
    print(f"XIndent : {textFragment.position.x_indent}")
    print(f"YIndent : {textFragment.position.y_indent}")
    print(f"Font - Name : {textFragment.text_state.font.font_name}" )
    print(f"Font - IsAccessible : {textFragment.text_state.font.is_accessible} " )
    print(f"Font - IsEmbedded : {textFragment.text_state.font.is_embedded} " )
    print(f"Font - IsSubset : {textFragment.text_state.font.is_subset} ")
    print(f"Font Size : {textFragment.text_state.font_size}" )
    print(f"Foreground Color : {textFragment.text_state.foreground_color} " )