Why Aspose.OCR for .NET?

Aspose.OCR for .NET is a robust, developer-friendly, and cost-effective API for optical character recognition. In less than 10 lines of native C# code, you can build OCR capabilities into your .NET desktop solutions, MVC-based Web applications, cloud services, and serverless Azure functions. Extract machine-readable text from scans, photos and screenshots, convert scanned pages into searchable and indexable PDFs, find and compare text on images focusing at business task rather than complex maths, neural networks, and other technical intricacies. Click the items below to learn more about our features and benefits.

Illustration ocr

Global applications

Recognize texts in Latin, Cyrillic and Asian scrips, including more than 6,000 Chinese characters and Hindi.

Read everything

Retrieve text from any file obtained through a scanner or camera, and process images directly from web links.

Reliable results

Achieve high recognition accuracy for all images, including those that are out-of-focus, rotated, distorted, and noisy.

Batch recognition

Bulk-recognize all images from folders and archives; read multi-page PDF documents and TIFF images.

Layout detection

Identify and categorize content blocks in images to ensure the correct order of extracted text, regardless of layout.

Live code sample

Optical character recognition becomes a trivial and straightforward task with Aspose.OCR, even for developers new to the technology. Just a few lines of code are enough to extract text from an image and display it on the screen. It really is that simple - give it a try.

Ready to recognize Ready to recognize Drop a file here or click to browse *

* By uploading your files or using the service you agree with our Terms of use and Privacy Policy.

Recognition result
 

Convert image to text

More examples >
// Initialize OCR engine
var recognitionEngine = new Aspose.OCR.AsposeOcr();
// Add image to the recognition batch
var source = new Aspose.OCR.OcrInput(Aspose.OCR.InputType.SingleImage);
source.Add("<file name>");

// Perform OCR
List<Aspose.OCR.RecognitionResult> results
     = recognitionEngine.Recognize(source);
// Output recognized text
Console.WriteLine(results[0].RecognitionText);

Platform independence

Aspose.OCR for .NET can work on any platform that supports .NET, .NET Core or .NET Framework - whether on a local machine, on the web server, or in cloud.

Microsoft Windows
Linux
MacOS
GitHub
Microsoft Azure
Amazon Web Services
Docker

Supported file formats

Aspose.OCR for .NET can work with virtually any file you can get from a scanner or camera. Recognition results are returned in the most popular file and data exchange formats that can be saved, imported to a database, or analyzed in real time.

Images

  • JPEG
  • PNG
  • TIFF
  • BMP
  • GIF

Batch OCR

  • Multi-page PDF
  • DjVu
  • ZIP
  • Folder

Recognition results

  • Text
  • PDF
  • Microsoft Word
  • Microsoft Excel
  • HTML
  • RTF
  • ePub
  • JSON
  • XML

Suitable for any content

The accuracy and reliability of text recognition is highly dependent on the quality of the original image. Aspose.OCR for .NET provides an extensive range of both fully automated and manual image processing filters that enhance an image before it is sent to the OCR engine.

Powerful image processing and customizable content structure detection algorithms enable text extraction from virtually any image, ranging from high-quality scans to street photos. Multiple processing filters can be applied to the same image to get the best recognition quality.

Resource optimization

Aspose.OCR for .NET enables highly flexible balancing of recognition speed, quality, and resource utilization for each specific use case:

  • Choose between thorough recognition and fast recognition.
  • Specify the number of threads allocated for recognition, or allow the library to automatically scale to the number of processor cores.
  • Free up the CPU by offloading the calculations to the GPU.

28 recognition languages

Aspose.OCR for .NET is a universal solution for document processing, data extraction, and content digitization on a global scale. With support for a vast array of European and Asian writing scripts, it is well-adapted for any scale, catering to both small and medium businesses as well as multinational corporations.

You can delegate language detection to the library or manually specify the language, enhancing recognition performance and reliability. The following languages are supported:

  • Extended Latin alphabet: Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish;
  • Cyrillic alphabet: Belorussian, Bulgarian, Kazakh, Russian, Serbian, Ukrainian;
  • Over 6,000 Chinese characters;
  • Hindi.

Features and capabilities

Aspose.OCR for .NET automatically extracts text from photos or scanned images, eliminating the need for manual retyping of documents.

Feature icon

Photo OCR

Extract text from smartphone photos with scan-level accuracy.

Feature icon

Searchable PDF

Convert any scan into a fully searchable and indexable document.

Feature icon

URL recognition

Recognize an image from URL without downloading it locally.

Feature icon

Bulk recognition

Read all images from multi-page documents, folders and archives.

Feature icon

Any font and style

Identify and recognize text in all popular typefaces and styles.

Feature icon

Fine-tune recognition

Adjust every OCR parameter for best recognition results.

Feature icon

Spell checker

Improve results by automatically correcting misspelled words.

Feature icon

Find text in images

Search for text or regular expression within a set of images.

Feature icon

Compare image texts

Compare texts on two images, regardless of the case and layout.

Easy to use

You only need a few lines of code to convert image to text, create a searchable PDF, save recognition results to document, and many more. Explore the code samples to understand how to integrate Aspose.OCR for .NET into your solutions.

Installation

Aspose.OCR for .NET is distributed as a NuGet package or as a downloadable file with minimal dependencies. The package can be added to your project directly from Microsoft Visual Studio. Simply install it to your project and you are ready to extract text from images and save recognition results in any of the supported formats. If your system has a CUDA capable GPU, you can use the GPU-accelerated OCR engine to significantly increase recognition performance.

You can start using Aspose.OCR for .NET right after the installation with some restrictions . A temporary license removes all limitations of the trial version for 30 days. Use it to start building a fully functional OCR application and make the final decision to purchase Aspose.OCR for .NET later.

Extract text from a photo

When people typically think of OCR (Optical Character Recognition), the first association is often with a scanner as the primary capture device. This association has historical reasons and is still prevalent in many contexts, providing consistent and controlled environment for capturing printed text from physical documents with unmatched quality. However, a scanner is specialized equipment that is not always at hand and requires a stationary workstation to operate. Fortunately, the modern world provides a convenient alternative to traditional scanners - a smartphone camera. The advancements in smartphone camera technology ensure that even an entry-level smartphone provides sufficient quality to capture OCR-ready documents. And built-in memory makes it easier than ever to digitize large quantities of documents, newspapers, books, street signs and other text on the go. All you need is the right technology to convert those photos into machine-readable text.

Aspose.OCR for .NET is specifically designed to recognize all types of images out-of-the-box and can be further fine-tuned to handle even the low-quality photos. Combined with a modern smartphone, it allows you to create powerful OCR applications for most everyday scanning and text recognition tasks. The most advanced image processing and document structure analysis are performed in a few lines of code, allowing you to focus on business rather than complex mathematical algorithms, neural networks and other technical intricacies.

Photo OCR - C#

// Configure preprocessing filters
PreprocessingFilter filters = new PreprocessingFilter {
  PreprocessingFilter.ContrastCorrectionFilter(),
  PreprocessingFilter.AutoDewarping()
};

// Add a photo for recognition
OcrInput photos = new OcrInput(InputType.SingleImage, filters);
photos.Add("photo.png");

// Fine-tune recognition setings
RecognitionSettings settings = new RecognitionSettings();
settings.Language = Language.Eng;
settings.DetectAreasMode = DetectAreasMode.CURVED_TEXT;

// Extract text from a page
AsposeOcr api = new AsposeOcr();
List<RecognitionResult> results = api.Recognize(photos, settings);

// Automatically correct spelling (English)
string text = results[0].GetSpellCheckCorrectedText(SpellCheckLanguage.Eng);
// Display recognized text
Console.WriteLine(text);

Create a searchable PDF from the scan

PDF is one of the most popular formats for scanning paper documents, especially due to its ability to combine multiple pages into a single file. This format is widely used for the exchange of contracts, invoices, legal documents, passports and ID cards, and many other documents between individuals, businesses, banks and government agencies. However, any scanned PDF is essentially a collection of images. It does not contain machine-readable text, so users cannot search, copy, or otherwise manipulate the document content.

Aspose.OCR for .NET offers you a fast, easy and highly reliable way to convert any scanned PDF into a fully searchable and indexable document. It accurately recognizes page content, converting it into a machine-readable text layer over the original image that can be selected, copied, read by text-to-speech software, and even automatically processed by translators, summarizers, and other AI-powered analytics tools.

Add text overlay to PDF - C#

// Load the scanned PDF
OcrInput pdf = new OcrInput(InputType.PDF);
pdf.Add("Delivery-Agreement.pdf");

// Recognize the text from document
AsposeOcr api = new AsposeOcr();
List<RecognitionResult> result = api.Recognize(pdf);

// Save searchable PDF
AsposeOcr.SaveMultipageDocument("Readable-Contract.pdf", SaveFormat.Pdf, result);
// Report progress
Console.WriteLine($@"Recognition finished. See '{Directory.GetCurrentDirectory()}\Readable-Contract.pdf'.");

Search for text in images

Digital archives, especially in large organizations, often consist of a vast collection of scans and photos, many of which may contain multi-page documents. Efficient management and organization of such archives effectively is essential for easy information retrieval and navigation. However, images do not contain machine-readable text, making it impossible to search and analyze document content.

Aspose.OCR for .NET allows you to easily search for text in images, regardless of the font, text size, style, and other parameters. The library also supports case-insensitive searches and regular expressions, which be extremely useful in various applications and industries. This functionality can be used for categorizing documents based on the content, keywords, or patterns found in the text; searching for specific terms or clauses within agreements and contracts; reorganizing files based on keywords or content found within them; locate and identify personal data within documents, making it easier to ensure GDPR compliance and manage sensitive information more effectively. Searching withing images also allows for creating automated workflows and streamline various business processes upon receiving signed contracts and invoices.

Search for text in images - C#

string sourceFolder = "images";
string searchFor = "OCR";

// Search for text in images
AsposeOcr api = new AsposeOcr();
foreach(var image in Directory.GetFiles(sourceFolder,"*.png"))
{
  bool found = api.ImageHasText(image, searchFor);
  if(found) Console.WriteLine($@"Found ""{searchFor}"" in image ""{image}""");
}