Extract PDF using C#

How to extract text and images from PDF using Aspose.PDF for .NET library

C# Java C++ Python

The most popular action with a Parser

Extract Text

Extract Images

Extract Fonts

How to parse PDF with Aspose.PDF for .NET Library

Do you need to extract a PDF? Programmatic modification of PDF documents is an essential part of modern digital workflows. With .NET libraries like Aspose.PDF, developers can extract text from PDF or pull images from PDF. These libraries are stand-alone solutions that don’t rely on other software and are ready for commercial use. They cover all possible needs of professional C# developers.

Extract PDF data: texts, images, forms, fields, etc.
Extract text from PDF
Extract Images from PDF
Extract Fonts from PDF
Extract Data from the Form
Extract Text From Stamps
Extract Data from Table

To extract PDF file, we’ll use Aspose.PDF for .NET API, which is a feature-rich, powerful and easy-to-use document manipulation API for net platform. Open NuGet package manager, search for Aspose.PDF and install. You may also use the following command from the Package Manager Console.

Package Manager Console

PM > Install-Package Aspose.PDF

Parse PDF using C#

To try the code in your environment, you need Aspose.PDF for .NET.

Load the PDF with an instance of Document.
Create a TextAbsorber object to extract text.
Accept the absorber for all the pages.
Get the extracted text
Create a writer and open the file, write a line of text to the file

Extract PDF Files - C#

This sample code shows how to extract PDF documents

Input file:

Upload a file

File not added

Output format:

PDF

Output file:

var inputFile = Path.Combine(dataDir, "ExtractTextAll.pdf");
var outputFile = Path.Combine(dataDir, "ExtractedText.txt");
var pdfDocument = new Aspose.Pdf.Document(inputFile);
var textAbsorber = new Aspose.Pdf.Text.TextAbsorber();
pdfDocument.Pages.Accept(textAbsorber);
File.WriteAllText(outputFile, textAbsorber.Text);

About Aspose.PDF for .NET API

Aspose.PDF for .NET API provides a wide range of features for working with PDF files. Some of the features include:

Create PDF documents from scratch or from HTML, XML, or images.
Edit existing PDF documents by adding or removing pages, text, images, and other content.
Convert PDF documents to other formats such as HTML, XML, and images.
Render PDF documents to images or XPS format.
Print PDF documents directly from your application.
Digitally sign PDF documents.

You can find more information on Aspose.PDF for C# API in this Aspose documentation