PDF Extractor

Extract images & text from PDF documents with free cross-platform Apps and APIs

How to Parse PDF File Using Aspose Library

Why use parsing PDF documents? To Parse PDF File, we’ll use Aspose.PDF API, which is a feature-rich, powerful, and easy-to-use document manipulation API. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console. Parse PDF documents is a term releated to extraction variuous kind of information from PDF file. Parse PDF document to extract text and images. Also, for separating PDF as text and images. Aspose.PDF Library allows you extract text from PDF and from stamps, extract images and fonts from PDF, extract data from tables and forms.

High Code APIs to Parse Document Native APIs to PDF files using .NET, .NET Core, Java, C++ & Android

Parse PDF Files

// Open document
Document pdfDocument = new Document(dataDir + "ExtractTextAll.pdf");

// Create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
// Get the extracted text
string extractedText = textAbsorber.Text;
// Create a writer and open the file
TextWriter tw = new StreamWriter(dataDir + "extracted-text.txt");
// Write a line of text to the file
tw.WriteLine(extractedText);
// Close the stream
tw.Close();