Batch recognition in C#

Automatically detect regions containing text paragraphs, detect skew and recognize text on images stored in ZIP archives using Aspose.OCR for .NET library.

How to get text from images in zip archives using C#

To get texts, areas coordinates and skew from images, provided in OcrInput object, use Aspose.OCR.AsposeOcr.Recognize method. Specify Aspose.OCR.RecognitionSettings. By default in our examle we use DetectAreasMode = DOCUMENT, AutoSkew = true and no any additional filters. Set InpuType as ZIP to recognize images in ZIP archives.

Command line tools

RecognizeArchive project

Run program in Command Prompt
  RecognizeArchive

Run program in Command Prompt if you want to use own archive
  RecognizeArchive folder/archive_name.zip

Recognition result

Property	Type	Description
FileName	String	Full path to the file.
RecognitionAreasRectangles	List	Gets rectangles coordinates.
RecognitionAreasText	List	Gets list recognition results of a list of areas (Rectangles).
RecognitionCharactersList	List<char[]>	A set of characters found by the recognition algorithm and arranged in descending order of probability.
RecognitionLinesResult	List Gets a list of recognition results with a list of rows (Rectangles).
RecognitionText	string	Gets recognition result in one string.
Skew	double Gets skew angle.
Warnings	List	Gets list of the warning messages describing non-critical faults appeared during generation.

This sample code shows how to recognize image
  // Set the license file
            //License lic = new License();
            //lic.SetLicense("Aspose.Total.lic");

            // Create AsposeOcr instance.
            // You can use the overloaded constructor to set characters restriction.
            AsposeOcr api = new AsposeOcr();
            // Create OcrInput object to containerize images
            // Add filters as you need 
            OcrInput input = new OcrInput(InputType.Zip/*, filters*/);
            input.Add(zipName);

            // you can recognize zip archive, images in folder or list of images
            // make sure that only supported formats and no subfolders are among the files
            List<RecognitionResult> res = api.Recognize(input, new RecognitionSettings 
            {
                //// allowed options
                // AllowedCharacters = CharactersAllowedType.LATIN_ALPHABET, // ignore not latin symbols
                // AutoContrast = false, // use Contrast correction filter before recognition - good for images with noice 
                // AutoSkew = true, // switch off if your image not rotated
                // DetectAreas = true, // switch off if your image has a simple document structure (one column text without pictures)
                // DetectAreasMode = DetectAreasMode.DOCUMENT, // depends on the structure of your image
                // IgnoredCharacters = "*-!@#$%^&", // define the symbols you want to ignore in the recognition result
                // Language = Language.Eng, // we support 26 languages
                // LinesFiltration = false, // this works slowly, so choose it only if your picture has lines and it they bad detected in TABLE ar DOCUMENT DetectAreasMode
                // PreprocessingFilters = new PreprocessingFilter // we automaticaly preprocess your image, but if your recognition result still bad, you can set up the set of filters by your own
                // {
                //     PreprocessingFilter.Dilate()
                // },
                // RecognitionAreas = new System.Collections.Generic.List<System.Drawing.Rectangle> // set this if you want to recognize only partiqular regions on the image
                // {
                //     new System.Drawing.Rectangle(0,0,10,20)
                // },
                // RecognizeSingleLine = false, // set this true if your image has only one text line (without other objects)
                // SkewAngle = 5, // use this if your want to switch on out automatically skew correction and set up your own angle
                // ThreadsCount = 1, // by default our API use all you threads. But you can run it in one thread. Simply set up this here
                // ThresholdValue = 150 // if you want to binarize image with your own threashold value, you can set up this here (from 1 to 255)
            });


            Console.WriteLine("RESULT");
            Console.ResetColor();
            Console.WriteLine("------------------------------------------------------------------------------");
            for (int i = 0; i < res.Count; i++)
            {
                Console.WriteLine($"IMAGE {i + 1}\n------------------------------");
                Console.WriteLine(res[i].RecognitionText);
                // you can print here additional information and spell-check the result
                // also you can save each page result in your prefered file format
                // res[i].Save(...);
                // or convert your result to json or xml string
                // res[i].GetJson();
                // res[i].GetXml();
            }

            // you can also save result as one multipage document
            // AsposeOcr.SaveMultipageDocument("result.pdf", SaveFormat.Pdf, res.ToList());

Other Supported Tools

Using C#, one can easily run our examples.

Recognize image (GIF, PNG, JPEG, BMP, TIFF, JFIF)

Recognize PDF (Scanned PDF)

Recognize TIFF (Multipage TIFF)

Preprocess image (GIF, PNG, JPEG, BMP, TIFF, JFIF)

Recognize ZIP archive (ZIP)

Get JSON (GIF, PNG, JPEG, BMP, TIFF, JFIF)

Get XLSX (GIF, PNG, JPEG, BMP, TIFF, JFIF)

Detect angle (GIF, PNG, JPEG, BMP, TIFF, JFIF)

Recognize image from URL (URL with GIF, PNG, JPEG, BMP, TIFF, JFIF)

Text areas detection (GIF, PNG, JPEG, BMP, TIFF, JFIF)

Batch recognition in C#

Automatically detect regions containing text paragraphs, detect skew and recognize text on images stored in ZIP archives using Aspose.OCR for .NET library.

Aspose.OCR for .NET

Overview

How to get text from images in zip archives using C#

Command line tools

RecognizeArchive project

Run program in Command Prompt

Run program in Command Prompt if you want to use own archive

Recognition result

This sample code shows how to recognize image

Other Supported Tools