Convert images and PDFs to text in Java

Easily create cross-platform Java applications with optical character recognition (OCR) capabilities.

Aspose.OCR for Java allows you to extract text from images, screenshots, specific areas of an image, and create searchable PDFs from scanned files on any platform that supports Java. With its powerful yet easy-to-use API, even the complex OCR tasks take less than 10 lines of code. You do not need to work with formulas and machine learning - the library will take care of all the technical details and produce reliable results in 26 languages based on Latin and Cyrillic scripts, as well as Chinese. OCR API processes scanned images, smartphone photos, screenshots, areas of images, and scanned PDFs and return results in the most popular document formats. Recognition speed, accuracy and performance can be further improved by distributing computation across multiple CPU cores and offloading resource-intensive tasks to the GPU.

At a Glance

A brief summary of optical character recognition capabilities.

Supported Fonts

Arial
Times New Roman
Courier New
Tahoma
Calibri
Verdana

Recognition

Whole image
Image areas
Archives and folders

Supported Languages

English
Chinese
German
French
Slovenian
Spanish
Czech
Polish
Romanian
Dutch
Russian
and 10+ more

Aspose.OCR

Platform Independence

Aspose.OCR for Java supports the JDK 1.6 and above.

Plugins

Ruby
PHP
IntelliJ IDEA - Maven

Java Runtime Environment

JSP/JSF applications
Desktop applications

Aspose.OCR

Supported File Formats

Convert any file you get from a scanner or camera to the most popular document and data exchange formats.

Source files

PDF
JPEG
PNG
TIFF
GIF
BMP

Recognition results

Searchable PDF
Microsoft Word
Microsoft Excel
Plain text
JSON
XML

Aspose.OCR

Features and Capabilities of Aspose.OCR for Java

Recognizes formatted text in scanned images and PDFs

Supports all file formats you can get form a scanner or camera

Reads Latin and Cyrillic scripts

Recognizes more than 6,000 Chinese characters

Detects and recognizes all popular typefaces

Carefully preserves font styles and formatting

Processes the whole image or selected areas only

Supports rotated, skewed and noisy images

Batch recognition of all images in a folder or archive

Recognizes images provided as web links

Finds and automatically corrects misspelled words

100% compatibility with other Aspose products

Easy to Install

You can use Aspose.OCR for Java directly from a Maven based project by following simple installation instructions.

Request a temporary license to start building a fully functional OCR application without any limits and restrictions.

Easy to Use

Image recognition requires a couple of lines of code. Literally. It's really that simple - try yourself:

Ready to recognize Recognizing Drop a file here or click to browse *

* By uploading your files or using the service you agree with our Terms of use and Privacy Policy.

Live code sample - Java

// Create instance of OCR API
AsposeOCR api = new AsposeOCR();
try {
    // Recognize image
    String result = api.RecognizePage("<file name>");
    // Display the recognition result
    System.out.println(result);
} catch (IOException e) {
    // Error handling
    e.printStackTrace();
}

Recognition result

26 Recognition Languages

OCR API supports a large number of languages and all popular writing scripts, including texts with mixed languages. The built-in spell checker automatically replaces misspelled words and saves you the trouble of manually correcting recognition results.

Extended Latin alphabet: Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish.
Cyrillic alphabet: Belorussian, Bulgarian, Kazakh, Russian, Serbian, Ukrainian.
Chinese: more than 6,000 characters.

You can specify the language to increase the recognition performance and reliability or let the API to detect languages automatically.

Preserve Formatting

The OCR API reads all popular typefaces such as Arial, Times New Roman, Courier New, Tahoma, Calibri and more in regular, bold and italic styles and carefully preserves formatting in OCR results. You can also split recognition results into lines and detect text areas in a page.

Recognize Photos

Scanner not always available on end user workstations, which may become a showstopper for OCR applications. Our OCR API provides a number of pre-processing filters that can handle distorted, rotated, skewed, and noisy images. In combination with support for all image formats, it allows for reliable recognition of even smartphone photos. Most of the pre-processing and image correction is done automatically, but you can always intervene in difficult cases.

Apply image corrections - Java

// Create instance of OCR API
AsposeOCR api = new AsposeOCR();

// Define pre-processing filters
PreprocessingFilter filters = new PreprocessingFilter();
filters.add(PreprocessingFilter.ToGrayscale());
filters.add(PreprocessingFilter.Rotate(-90));

// Pre-process image before recognition
BufferedImage imageRes = api.PreprocessImage(imagePath, filters);

// Recognize image
RecognitionResult result = api.RecognizePage(imageRes, set);

Performance Optimization

Optical character recognition requires a lot of processing resources, which may become a problem for web services and entry-level devices. The API offers very flexible ways to balance recognition speed, resource requirements, and accuracy:

Choose between thorough recognition and fast recognition.
Specify the number of threads allocated for recognition, or allow the library to automatically scale to the number of CPU cores.
Free up the CPU by offloading the calculations to the GPU.

Fast recognition - Java

// Create instance of OCR API
AsposeOCR api = new AsposeOCR();

// Recognize image in the fastest mode
String result = api.RecognizePageFast("sample.jpg");

Batch Processing

The OCR API frees you from having to recognize each image one at a time by offering various batch processing methods that allow you to recognize multiple images in a single call:

Recognition of multi-page PDF and TIFF files.
Recognition of all files in an archive.
Recognition of all files in a folder.

Support and Learning Resources

Why Aspose.OCR for Java?
Customers List
Success Stories

Download Free Trial Pricing Information

Aspose also offers native OCR APIs for other popular programming languages:

Convert images and PDFs to text in Java

Easily create cross-platform Java applications with optical character recognition (OCR) capabilities.

Aspose.OCR for Java

Overview

At a Glance

Platform Independence

Supported File Formats

Features and Capabilities of Aspose.OCR for Java

Easy to Install

Easy to Use

Live code sample - Java

26 Recognition Languages

Preserve Formatting

Recognize Photos

Apply image corrections - Java

Performance Optimization

Fast recognition - Java

Batch Processing

Support and Learning Resources

Aspose.OCR for.NET

Aspose.OCR forC++