Convert images and PDFs to text in C++

Add optical character recognition (OCR) to your C++ applications with a few lines of code.

Aspose.OCR for C++ extends your applications with optical character recognition capabilities in 5 lines of code. Our experience in neural networks and machine learning has been translated into an OCR library with superior performance and accuracy that supports 26 languages based on Latin and Cyrillic scrips as well as Chinese. OCR API can recognize scanned images, smartphone photos, screenshots, areas of images, and scanned PDFs and return results in the most popular document and data exchange formats. It is completely offline and does not require an internet connection to work. The API requires minimal effort to get started and a shallow learning curve to master. All pre-processing, skew correction, noise removal, language detection, multi-threading and other complex tasks are performed automatically, but can be tuned up to deal with hard cases.

At a Glance

A brief summary of optical character recognition capabilities.

Core Features

Extract text from photos
Create searchable PDFs
Automatic image corrections
Support multiple typefaces
Preserve text formatting
Detect text fragments
Batch processing
Spell checking

Supported Languages

English
Chinese
German
French
Italian
Spanish
Russian
Czech
Polish
Ukrainian
Dutch
Estonian
and 10+ more

Aspose.OCR

Platform Independence

The OCR library fully supports C++ applications for both Windows and Linux.

Windows 64 bit

Linux 64 bit

Aspose.OCR

Supported File Formats

Convert any file you get from a scanner or camera to the most popular document and data exchange formats.

Source files

PDF
JPEG
PNG
TIFF
BMP

Recognition results

Searchable PDF
Microsoft Word
Microsoft Excel
Plain text
JSON
XML

Aspose.OCR

Advanced C++ OCR API Features

Extracts text from images and creates searchable PDFs

Supports any image you can get form a scanner or camera

Reads Extended Latin and Cyrillic scripts

Recognizes over 6,000 Chinese characters

Detects and recognizes all popular typefaces and formatting

Pre-processes images before recognition

Processes the whole image or selected areas only

Supports rotated, skewed and noisy images

Batch recognition of all images in a folder or archive

Recognizes images provided as web links

Finds and automatically corrects misspelled words

Get recognition result as JSON

Easy to Use

Do you still think C++ OCR is hard? With our library, you only need 5 lines of code to recognize the image and display the result. Try this code and see for yourself:

Ready to recognize Recognizing Drop a file here or click to browse *

* By uploading your files or using the service you agree with our Terms of use and Privacy Policy.

Live code sample - C++

// Prepare buffer for result
const size_t len = 4096;
const buffer[len] = { 0 };
// Do the magic
size_t size = aspose::ocr::page(L"<file name>", buffer, len);
// Display the recognition result
std::wcout << buffer << L"\n";

Recognition result

26 Recognition Languages

OCR API can recognize a large number of languages and all popular writing scripts, including texts with mixed languages.

Extended Latin alphabet: Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish.
Cyrillic alphabet: Belorussian, Bulgarian, Kazakh, Russian, Serbian, Ukrainian.
Chinese: more than 6,000 characters.

You can leave the language detection to the library or define the language yourself to increase the recognition performance and reliability.

Batch Processing

OCR API frees you from recognizing every image one-by-one by offering various batch-processing methods that allow you to recognize multiple images in one call:

Recognition of multi-page PDF and TIFF files.
Recognition of all files in a folder.
Recognition of all files in an archive.

Recognize ZIP archive - C++

// Provide archive path
std::string archive_path = "book.zip";

// Prepare buffer for result
const size_t len = 4096;
wchar_t buffer[len] = { 0 };

// Initialize RecognitionSettings object with default values
RecognitionSettings settings;

// recognize
size_t res_len = aspose::ocr::pages_multi(archive_path.c_str(), buffer, len, settings);

Preserve Formatting

The OCR library reads all popular typefaces such as Arial, Times New Roman, Courier New, Tahoma, Calibri and more in regular, bold and italic styles and carefully preserves formatting in OCR results. You can also split recognition results into lines and detect text areas in a page.

Recognize Photos

The widespread adoption of OCR applications is usually stopped by the fact that scanners are not commonplace for most users. Our OCR library has powerful built-in image pre-processing filters that can handle dark, rotated, skewed, and noisy images. In combination with support for all image formats, it allows for reliable recognition of even smartphone photos. Most of the pre-processing and image correction is done automatically, so you will only have to intervene in difficult cases.

Set custom angle for skew correction - C++

// Original image
std::string image_path = "../Data/Source/sample.png";
rect rectangles[2] = { {90, 186, 775, 95} , { 928, 606, 790, 160 } };

// Prepare buffer for result
const size_t len = 4096;
wchar_t buffer[len] = { 0 };

// Adjust skew angle
RecognitionSettings settings;
settings.format = export_format::text;
settings.rectangles = rectangles;
settings.rectangles_size = 2;
settings.skew = 5;

// Recognize image
size_t res_len = aspose::ocr::page_settings(image_path.c_str(), buffer, len, settings);

Spell Check

While the OCR produces reliable results, dust and print defects might cause some symbols to be recognized incorrectly. OCR API has a built-in spell checker that automatically replaces misspelled words and frees you from having to manually correct the recognition results.

Support and Learning Resources

Why Aspose.OCR for C++?
Customers List
Success Stories

Download Free Trial Pricing Information

Aspose also offers native OCR APIs for other popular programming languages:

Convert images and PDFs to text in C++

Add optical character recognition (OCR) to your C++ applications with a few lines of code.

Aspose.OCR for C++

Overview

At a Glance

Platform Independence

Supported File Formats

Advanced C++ OCR API Features

Easy to Use

Live code sample - C++

26 Recognition Languages

Batch Processing

Recognize ZIP archive - C++

Preserve Formatting

Recognize Photos

Set custom angle for skew correction - C++

Spell Check

Support and Learning Resources

Aspose.OCR for.NET

Aspose.OCR forJava