Aspose.OCR  for .NET

Convert images and PDFs to text in .NET

Add optical character recognition (OCR) to your .NET applications with a few lines of code.

  Download Free Trial
  
 

Aspose.OCR for .NET is a powerful yet easy-to-use and cost-effective API for optical character recognition. With it, you can add OCR functionality to your .NET applications in less than 5 lines of code without worrying about complex math, neural networks, and other technical details. Our experience in machine learning technologies and years of development resulted in an OCR engine with superior speed and accuracy that supports 26 languages based on Latin and Cyrillic scrips as well as Chinese. OCR API can recognize scanned images, smartphone photos, screenshots, areas of images, and scanned PDFs and return results in the most popular document and data exchange formats. Various pre-processing filters allow you to recognize rotated, skewed and noisy images. Recognition performance and system load can be further improved by transferring of resource intensive computational tasks to the GPU.

Features and Capabilities of Aspose.OCR for .NET

Converts images and PDFs to text

Supports all image formats you can get form a scanner or camera

Reads languages based on Latin and Cyrillic

Recognizes more than 6,000 Chinese characters

Detects and recognizes all popular typefaces

Carefully preserves font styles and formatting

Processes the whole image or selected areas only

Supports rotated, skewed and noisy images

Batch recognition of all images in a folder or archive

Recognizes images provided as web links

Finds and automatically corrects misspelled words

Fully compatible with other Aspose products

Easy to Install

Aspose.OCR for .NET is distributed as a lightweight NuGet package or as a downloadable file with minimal dependencies. Simply install it to your project and you are ready to recognize texts in any supported languages and save recognition results in any of the supported formats.

Request a temporary license to start building a fully functional OCR application without any limits and restrictions.

Easy to Use

You need three lines of code to recognize the image and display the result. Yes, it really is that simple!

Ready to recognize Recognizing Drop a file here or click to browse *

* By uploading your files or using the service you agree with our Terms of use and Privacy Policy.

Live code sample - C#

// Initialize OCR engine
var recognitionEngine = new Aspose.OCR.AsposeOcr();
// Extract text from image
string result = recognitionEngine.RecognizeImage("<file name>");
// Display the recognition result
Console.WriteLine(result);
Recognition result
 

Cross-Platform

The library fully supports .NET Standard 2.0. It means the applications can run on any platform: desktop Windows, Windows Server, macOS, Linux, and cloud.

26 Recognition Languages

OCR API can recognize a large number of languages and all popular writing scripts, including texts with mixed languages.

  • Extended Latin alphabet: Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish.
  • Cyrillic alphabet: Belorussian, Bulgarian, Kazakh, Russian, Serbian, Ukrainian.
  • Chinese: more than 6,000 characters.

You can leave the language detection to the library or define the language yourself to increase the recognition performance and reliability.

Recognize Photos

The biggest barrier to OCR applications is that scanners are not commonplace for end users. The API has powerful built-in image pre-processing filters that can handle rotated, skewed, and noisy images. In combination with support for all image formats, it allows for reliable recognition of even smartphone photos. Most of the pre-processing and image correction is done automatically, so you will only have to intervene in difficult cases.

Apply automatic image corrections - C#

// Initialize OCR engine
var recognitionEngine = new Aspose.OCR.AsposeOcr();

// Enable automatic skew calculation and contrast
var recognitionEngine = new Aspose.OCR.AsposeOcr();
var recognitionSettings = new Aspose.OCR.RecognitionSettings() {
    AutoContrast = true,
    AutoSkew = true
};

// Recognize image
Aspose.OCR.RecognitionResult result = recognitionEngine.RecognizeImage("IMG_20220622_163123.jpg", recognitionSettings);

Universal converter

The API can read literally any image you can get from a scanner, camera or smartphone: PDF documents, JPEG, PNG, TIFF, GIF, BMP images, and even DjVu files. Multi-page PDF documents, TIFF and DjVu images are fully supported. You can also provide an image from the web via a URL.

Recognition results are returned in the most popular document and data exchange formats: plain text, PDF, Microsoft Word, Microsoft Excel, JSON, and XML.

Recognize PDF and save results to JSON - C#

// Initialize OCR engine
var recognitionEngine = new Aspose.OCR.AsposeOcr();

// Recognize all pages from scanned PDF
var pages = recognitionEngine.RecognizePdf("sample.pdf", new Aspose.OCR.DocumentRecognitionSettings());

// Output each page as JSON
foreach(var page in pages)
{
    Console.WriteLine(page.GetJson());
}

Resource Optimization

Optical character recognition is a resource-intensive process. The API offers very flexible ways to strike a balance in the classic time-price-quality triad:

  • Choose between thorough recognition and fast recognition.
  • Specify the number of threads allocated for recognition, or allow the library to automatically scale to the number of processor cores.
  • Free up the CPU by offloading the calculations to the GPU.

Fast recognition - C#

// Initialize OCR engine
var recognitionEngine = new Aspose.OCR.AsposeOcr();

// Recognize image in the fastest mode
string result = recognitionEngine.RecognizeImageFast("sample.jpg");

Spell Check

While the OCR produces reliable results, dust and print defects might cause some symbols to be recognized incorrectly. OCR API has a built-in spell checker that automatically replaces misspelled words and frees you from having to manually correct the recognition results.

Spell checking recognition results - C#

// Initialize OCR engine
var recognitionEngine = new Aspose.OCR.AsposeOcr();

// Enable automatic contrast
var recognitionEngine = new Aspose.OCR.AsposeOcr();
var recognitionSettings = new Aspose.OCR.RecognitionSettings() {
    AutoContrast = true
};

// Recognize image
Aspose.OCR.RecognitionResult result = recognitionEngine.RecognizeImage("sample.jpg", recognitionSettings);

// Correct misspelled words
string text =  result.GetSpellCheckCorrectedText();

Batch Processing

OCR API frees you from recognizing every image one-by-one by offering various batch-processing methods that allow you to recognize multiple images in one call:

  • Recognition of multi-page PDF, TIFF, and DjVu files.
  • Recognition of all files in a folder.
  • Recognition of all files in an archive.
  • Recognition of all files from a list.

Learn by Example

Aspose.OCR for .NET comes with a number of examples written in C# that allow you to quickly familiarize yourself with its functions and capabilities and give you an idea of creating solutions for your business needs.

  
  
  

Aspose also offers native OCR APIs for other popular programming languages: