In order to search PDF file, we’ll use Aspose.PDF for .NET API which is a feature-rich, powerful and easy to use document manipulation API for python-net platform. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console.
Search PDF File via Python
You need Aspose.PDF for Python via .NET to try the code in your environment.
- Load the PDF with an instance of Document.
- Create TextFragmentAbsorber Object with text to find as parameter.
- Get all extracted text fragments collection.
- Loop through each fragment to get all of its information.
Search PDF Files - Python
import aspose.pdf as ap
# Search Text from All the Pages of PDF Document
pdfDocument = ap.Document("c:\\samples\\sample.pdf")
# Create TextAbsorber object to find all instances of the input search phrase
textFragmentAbsorber = ap.text.TextFragmentAbsorber("PDF")
# Accept the absorber for all the pages
pdfDocument.pages.accept(textFragmentAbsorber)
# Loop through the fragments
for textFragment in textFragmentAbsorber.text_fragments:
print(f"Text : {textFragment.text}" )
print(f"Position : {textFragment.position}")
print(f"XIndent : {textFragment.position.x_indent}")
print(f"YIndent : {textFragment.position.y_indent}")
print(f"Font - Name : {textFragment.text_state.font.font_name}" )
print(f"Font - IsAccessible : {textFragment.text_state.font.is_accessible} " )
print(f"Font - IsEmbedded : {textFragment.text_state.font.is_embedded} " )
print(f"Font - IsSubset : {textFragment.text_state.font.is_subset} ")
print(f"Font Size : {textFragment.text_state.font_size}" )
print(f"Foreground Color : {textFragment.text_state.foreground_color} " )