Convert images and PDFs to text in Node.js
Extract text from images and convert scans into searchable PDFs with a few lines of JavaScript code.
Download Free TrialAspose.OCR for Node.js via C++ allows you to easily add optical character recognition functionality to serverless cloud applications, scripts, web sites and more. Thanks to the powerful runtime environment provided by Node.js, the same code can run on various platforms, such as Windows, Linux, macOS, and more.
Our library can recognize scanned images or even smartphone photos in all popular formats, returning results as text, documents, JSON, and XML. It supports 28 languages based on Latin, Cyrillic and Asian scrips, including Hindi and Chinese.
Aspose.OCR for Node.js via C++ does not rely on external web services. It can work anywhere - from web servers and cloud to in-house intranet environments and on-premise servers without Internet connection.
The API requires minimal effort to get started and a shallow learning curve to master. It is easy to learn and use for both front-end and back-end JavaScript developers. The default recognition options are already fine-tuned for best accuracy and performance, but can be adjusted to deal with low-quality photos, noisy and skewed images, distortions, and other hard cases.
Features of Aspose.OCR for Node.js via C++
Converts images to machine-readable text - from a single line to multi-page documents
Extracts texts from all types of images - from scans to smartphone photos
Reads all Latin and Cyrillic scripts
Recognizes Hindi and Chinese characters
Converts scanned PDF to searchable PDFs and editable Word documents
Improves OCR accuracy by fine-tuning every aspect of recognition
Reads the whole image or selected areas only
Supports skewed, noisy and wrapped images
Recognizes all images in a folder or archive in a single API call
Automatically detects image defects that can negatively affect recognition accuracy
Finds and automatically corrects misspelled words
Returns recognition result as JSON and XML that can be analyzed and imported to databases
Easy to Use
Aspose.OCR for Node.js via C++ is very easy to start with and master. Do not take our word for it, see for yourself:
* By uploading your files or using the service you agree with our Terms of use and Privacy Policy.
Live code sample - Node.js
const Module = require("path/to/asposeocr.js");
Module.onRuntimeInitialized = async _ =>
{
// Prepare input
var input = Module.WasmAsposeOCRInput();
var inputs = new Module.WasmAsposeOCRInputs();
// Prepare settings
var settings = Module.WasmAsposeOCRRecognitionSettings();
input.url = "<file name>";
inputs.push_back(input);
var result = Module.AsposeOCRRecognize(inputs, settings);
// Serrialize result
var result_str = Module.AsposeOCRSerializeResult(result, Module.ExportFormat.text);
console.log(result_str);
}
European and Asian languages
Aspose.OCR for Node.js via C++ API can extract text in 27 languages based on Latin, Cyrillic, Chinese, and Indic scripts.The library also supports texts in mixed languages.
- Extended Latin alphabet: Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Italian, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish.
- Cyrillic alphabet: Belorussian, Bulgarian, Kazakh, Russian, Serbian, Ukrainian.
- Chinese: more than 6,000 characters.
- Hindi.
You can leave the language detection to the library or define the language yourself to increase the recognition performance and reliability.
Batch Processing
OCR API frees you from recognizing every image one-by-one by offering various batch-processing methods that allow you to recognize multiple images in one call:
- Recognition of multi-page PDF and TIFF files.
- Recognition of all files in a folder.
- Recognition of all files in an archive.
Recognize ZIP archive - Node.js
const Module = require("path/to/asposeocr.js");
Module.onRuntimeInitialized = async _ =>
{
// Prepare input
var input = Module.WasmAsposeOCRInput();
// Provide archive path
input.url = "archive.zip";
var inputs = new Module.WasmAsposeOCRInputs();
// Prepare settings
var settings = Module.WasmAsposeOCRRecognitionSettings();
inputs.push_back(input);
var result = Module.AsposeOCRRecognize(inputs, settings);
// Serrialize result
var result_str = Module.AsposeOCRSerializeResult(result, Module.ExportFormat.text);
}
Image preprocessing
The accuracy and reliability of text recognition is highly dependent on the quality of the original image. Aspose.OCR for Node.js via C++ offers a large number of fully automated and manual image processing filters that enhance an image before it is sent to the OCR engine:
- Automatically straighten skewed images.
- Remove dirt, spots, scratches, glare, unwanted gradients, and other noise.
- Automatically adjust the contrast.
- Blur noisy images while preserving the edges of high-contrast objects like letters.
- Increase the thickness of characters.
- Convert images to black and white or grayscale.
- Find potentially problematic areas of image during recognition.
Set custom angle for skew correction - Node.js
const Module = require("path/to/asposeocr.js");
Module.onRuntimeInitialized = async _ =>
{
// Prepare input
var input = Module.WasmAsposeOCRInput();
input.url = "../Data/Source/sample.png";
var inputs = new Module.WasmAsposeOCRInputs();
// Prepare settings
var settings = Module.WasmAsposeOCRRecognitionSettings();
settings.skew = 5;
inputs.push_back(input);
var result = Module.AsposeOCRRecognize(inputs, settings);
// Serrialize result
var result_str = Module.AsposeOCRSerializeResult(result, Module.ExportFormat.text);
}
Specialized recognition models
Aspose.OCR for Node.js via C++ offers specifically tuned OCR functions for extracting text from certain types of images:
- Digitize scans or photos of passports and other ID cards.
- Extract text from vehicle license plates.
- Convert scanned receipts into machine-readable text.
Document areas detection
A scanned image or photograph of a text document may contain a large number of blocks of various content - text paragraphs, tables, illustrations, formulas, and the like. Detecting, ordering, and classifying areas of interest on a page is the cornerstone of successful and accurate OCR.
Aspose.OCR for Node.js via C++ offers a number of document areas detection algorithms, allowing you to choose the one that works best for your specific content: from street photos to books and contracts.
Support and Learning Resources
- Learning Resources
- Documentation
- Source Code
- API References
- Repository
- Tutorial Videos
- Product Support
- Free Support
- Paid Support
- Blog
- Release Notes
- Why Aspose.OCR for Node.js via C++?
- Customers List
- Success Stories