Useful Open Source and how we taught Zxing to speak another language

In one of our articles, we talked about how you can use incoming mail features in SharePoint 2010 to receive and process documents containing scanned coupons. In carrying out this project, we had to solve several interesting problems. And now we want to elaborate on one point.

So, one of the tasks was to recognize the numbers on the sheet of the coupons scan. It should be noted that there may be several coupons, and they can be placed on a sheet both vertically and horizontally.
')
What we saw on the coupons scans strongly resembled the Codabar barcode, which we had already encountered on other projects.

Codabar is a linear barcode. Each character is encoded by 7 elements: 4 lines and 3 spaces between them. Between the characters are separated by additional space. Codabar starts with a start symbol, and ends with a stop symbol. Start or stop symbols are, as a rule, ABCD symbols. Informative: 0-9, -, $.
Thus, for this bar code there is an alphabet, where each symbol corresponds to a specific sequence of lines and spaces.

The picture shows an example of Codabar'a, containing the value "401".

Zxing

When working with barcodes in .NET, we use the ported version of the Zxing library . The library is able to generate and recognize all kinds of 1D and 2D barcodes: QR-Code, PDF 417, EAN, UPC, Aztec, Data Matrix. And most importantly, she knows how to work with Codabar. The use of the Zxing library usually does not cause problems, we used it on different platforms. But our bar code Zxing outright failed to recognize. Everything turned out to be not so simple ... On closer examination, it turned out that the customer codes, although very similar to Codabar, are still different, namely:

have other start and stop symbols;
each informative symbol does not consist of the standard 7 elements (4 lines and 3 spaces), but 9 (5 lines, 4 spaces);
start and stop symbols also consist not of 7 elements, but of 3 (2 lines, 1 space).

Perhaps this format is also “standard”, although we did not find its detailed description and information about it. Perhaps there are library implementations for automating the recognition of this code, but we were not lucky to find them ... As a result, it was decided to try to continue working with Zxing, and do the following: take the source code and change the recognition algorithm to fit our needs.

Algorithm

In Zxing, each class that implements the recognition logic of a specific code (for example, CodabarReader.cs) has its own implementation of the abstract decodeRow method declared in the OneDReader.cs class.

override public List<Result> decodeRow(int rowNumber, BitArray row, Hashtable hints)

The input is the line number of the image and the array itself, containing the pixel values of the row (dark - light).

Then, using the setCounters (BitArray row) method, an array of int [] counters is initiated using the following algorithm: starting with a dark pixel, the first element of the array begins to increment until a white pixel is found in the row array. After that, the transition to the second element of the counters array occurs, and it is also incremented until a black pixel appears. And so on until the end of the line. As a result, the counters array will look like

15 7 10 3 4 8 16 ...

Ie: 15 black pixels, 7 white, 10 black, 3 white, etc. (the first element in this implementation corresponds to black pixels).

Next, we look for the sequence corresponding to the start symbol (in our case, this is the “A” symbol, in the original Codabar, one of the “A”, “B”, “C” or “D” symbols). We search using the findStartPattern method (out int charOffset, int globalOffset). Until an inconsistency is found, increase the value of globalOffset (determines the current position in the image string) and go to the next character in the array counters. In the findStartPattern method, the method is called:

 int toNarrowWidePattern(int position, int offset)

It takes the current element number of the counters array and the length of the symbol (3 for the start or stop character; 9 for the remaining characters). Returns -1 if the character is not found. If the character is found, it returns this position in the CHARACTER_ENCODINGS array.

Alphabet

The code alphabet is defined by the following fields:

char [] ALPHABET_STRING - contains all the characters used in the code.
int [] CHARACTER_ENCODINGS - contains a digit that defines the code sequence characteristic of each character of the code.

A few words about the values stored in the CHARACTER_ENCODINGS array and generally how Codabar is encoded. For example, the digit "0" is encoded by the following sequence of strips and spaces:

This is written as follows: 101010011 (barcode encoding). A single 0/1 encodes a short space / strip, a double 00/11 encodes a long space / strip. Further, this sequence is converted to code 0000011 (width encoding), or in hexadecimal form: 0x03. Those. single characters are written as zero, double characters as one. In our case, each character is encoded not by 7, but by 9 characters, but the logic of creating a digital code is the same.

We had to spend some time studying the examples of coupons. We carefully looked at the bar codes and wrote out sequences that correspond to a particular character. The result is our own alphabet:

 private const String ALPHABET_STRING = "0123456789AE"; static int[] CHARACTER_ENCODINGS = { 0x014, 0x101, 0x041, 0x140, 0x011, 0x110, 0x050, 0x005, 0x104, 0x044, // 0-9 0x000, 0x004, // AE};

So, the process of processing the code is as follows: as soon as we find the starting symbol, we are looking for information using the same method toNarrowWidePattern. The sequence length is fixed, i.e. at a certain step we need to check whether the stop symbol is not. If yes, then we form the result and go to the next element of the counters array, continuing the search for bar codes in the string.

As a result, having scanned the string, we have (or do not have) one or more codes that we save into the global array of results. And go to the next line of the image.

The ability to turn the image clockwise 90 degrees was also added, if you need to check the document for the presence of codes in all four positions. In the Zxing library, the image that needs to be processed is contained in the BinaryBitmap class, which has a rotateCounterClockwise () method. Rotate the image is not difficult.

Thus, having thought and worked a little, we were able to modify the library for a new code format. Who cares, the code can be found here

Hidden text

 using System; using System.Collections; using System.Collections.Generic; using System.Text; using BitArray = ETR.REBT.BarcodeReader.common.BitArray; namespace ETR.REBT.BarcodeReader.oned { public sealed class MyCodeReader : OneDReader { // These values are critical for determining how permissive the decoding // will be. All stripe sizes must be within the window these define, as // compared to the average stripe size. private static readonly int MAX_ACCEPTABLE = (int)(PATTERN_MATCH_RESULT_SCALE_FACTOR * 2.0f); private static readonly int PADDING = (int)(PATTERN_MATCH_RESULT_SCALE_FACTOR * 1.5f); private static readonly int STARTEND_LENGTH = 3; private static readonly int SYMBOL_LENGTH = 9; private static readonly int DATA_LENGTH = 15; // 15 symbols + 2 start/stop symbols private static readonly int All_LENGHT = (16 + DATA_LENGTH * SYMBOL_LENGTH + 2 * STARTEND_LENGTH); private const String ALPHABET_STRING = "0123456789AE"; internal static readonly char[] ALPHABET = ALPHABET_STRING.ToCharArray(); /** * These represent the encodings of characters, as patterns of wide and narrow bars. The 7 least-significant bits of * each int correspond to the pattern of wide and narrow, with 1s representing "wide" and 0s representing narrow. */ internal static int[] CHARACTER_ENCODINGS = { 0x014, 0x101, 0x041, 0x140, 0x011, 0x110, 0x050, 0x005, 0x104, 0x044, // 0-9 0x000, 0x004, // AE }; // minimal number of characters that should be present (inclusing start and stop characters) // under normal circumstances this should be set to 3, but can be set higher // as a last-ditch attempt to reduce false positives. private const int MIN_CHARACTER_LENGTH = 3; // Start and end patterns private static readonly char[] START_ENCODING = { 'A' }; private static readonly char[] END_ENCODING = { 'E' }; private static readonly char[] DATA_ENCODING = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; // some codabar generator allow the codabar string to be closed by every // character. This will cause lots of false positives! // some industries use a checksum standard but this is not part of the original codabar standard // for more information see : http://www.mecsw.com/specs/codabar.html // Keep some instance variables to avoid reallocations private readonly StringBuilder decodeRowResult; private int[] counters; private int counterLength; public MyCodeReader() { decodeRowResult = new StringBuilder(40); counters = new int[500]; counterLength = 0; } override public List<Result> decodeRow(int rowNumber, BitArray row, Hashtable hints) { List<Result> returnList = null; if (!setCounters(row)) return null; int globalOffset = 0; while (globalOffset < counterLength) { int startSymbolPos = -1; int startOffset = findStartPattern(out startSymbolPos, globalOffset); if (startOffset < 0) return returnList; // we can't find start char in the whole row -> so, exit decodeRowResult.Length = 0; decodeRowResult.Append((char)startSymbolPos); int nextStart = startOffset; nextStart += (STARTEND_LENGTH + 1/*space between symbols*/); bool findNextStart = false; do { int charOffset = toNarrowWidePattern(nextStart, SYMBOL_LENGTH); if (charOffset == -1 || !arrayContains(DATA_ENCODING, ALPHABET[charOffset])) { findNextStart = true; break; } decodeRowResult.Append((char)charOffset); nextStart += (SYMBOL_LENGTH + 1); // Stop as soon as length of data symbols equals to corresponding number if (decodeRowResult.Length == DATA_LENGTH + 1/*start symbol*/) { int endOffset = toNarrowWidePattern(nextStart, STARTEND_LENGTH); if (endOffset == -1 || !arrayContains(END_ENCODING, ALPHABET[endOffset])) { findNextStart = true; break; } globalOffset = nextStart + STARTEND_LENGTH; decodeRowResult.Append((char)endOffset); break; } } while (nextStart < counterLength); // no fixed end pattern so keep on reading while data is available if (findNextStart) { globalOffset = ++startOffset; continue; } if (!validatePattern()) { globalOffset = ++startOffset; continue; } // remove stop/start characters character decodeRowResult.Remove(decodeRowResult.Length - 1, 1); decodeRowResult.Remove(0, 1); int runningCount = 0; for (int i = 0; i < startOffset; i++) { runningCount += counters[i]; } float left = (float)runningCount; for (int i = startOffset; i < nextStart - 1; i++) { runningCount += counters[i]; } float right = (float)runningCount; Result result = new Result( decodeRowResult.ToString(), null, new ResultPoint[] { new ResultPoint(left, (float) rowNumber), new ResultPoint(right, (float) rowNumber) }, BarcodeFormat.CODABAR); if (returnList == null) returnList = new List<Result>(); returnList.Add(result); } return returnList; } private bool validatePattern() { if (decodeRowResult.Length != DATA_LENGTH + 2) { return false; } // Translate character table offsets to actual characters. for (int i = 0; i < decodeRowResult.Length; i++) { decodeRowResult[i] = ALPHABET[decodeRowResult[i]]; } // Ensure a valid start character char startchar = decodeRowResult[0]; if (!arrayContains(START_ENCODING, startchar)) { return false; } // Ensure a valid end character char endchar = decodeRowResult[decodeRowResult.Length - 1]; if (!arrayContains(END_ENCODING, endchar)) { return false; } // Ensure a valid data symbols for (int i = 1; i < decodeRowResult.Length - 1; i++) { if (!arrayContains(DATA_ENCODING, decodeRowResult[i])) { return false; } } return true; } /// <summary> /// Records the size of all runs of white and black pixels, starting with white. /// This is just like recordPattern, except it records all the counters, and /// uses our builtin "counters" member for storage. /// </summary> /// <param name="row">row to count from</param> private bool setCounters(BitArray row) { counterLength = 0; // Start from the first white bit. int i = row.getNextUnset(0); int end = row.Size; if (i >= end) { return false; } bool isWhite = true; int count = 0; for (; i < end; i++) { if (row[i] ^ isWhite) { // that is, exactly one is true count++; } else { counterAppend(count); count = 1; isWhite = !isWhite; } } counterAppend(count); return true; } private void counterAppend(int e) { counters[counterLength] = e; counterLength++; if (counterLength >= counters.Length) { int[] temp = new int[counterLength * 2]; Array.Copy(counters, 0, temp, 0, counterLength); counters = temp; } } private int findStartPattern(out int charOffset, int globalOffset) { charOffset = -1; // // Assume that first (i = 0) set of pixels is white, // so we start find symbols from second set (i = 1). // And next we step over white set ('i += 2'). // for (int i = 1 + globalOffset; i < counterLength; i += 2) { if (counters[i - 1] < counters[i] * 5) // before start char must be a long space continue; charOffset = toNarrowWidePattern(i, 3); if (charOffset != -1 && arrayContains(START_ENCODING, ALPHABET[charOffset])) { return i; } } return -1; } internal static bool arrayContains(char[] array, char key) { if (array != null) { foreach (char c in array) { if (c == key) { return true; } } } return false; } // Assumes that counters[position] is a bar. private int toNarrowWidePattern(int position, int offset) { int end = position + offset; if (end >= counterLength) return -1; // First element is for bars, second is for spaces. int[] maxes = { 0, 0 }; int[] mins = { Int32.MaxValue, Int32.MaxValue }; int[] thresholds = { 0, 0 }; for (int i = 0; i < 2; i++) { for (int j = position + i; j < end; j += 2) { if (counters[j] < mins[i]) { mins[i] = counters[j]; } if (counters[j] > maxes[i]) { maxes[i] = counters[j]; } } double tr = ((double)mins[i] + (double)maxes[i]) / 2; thresholds[i] = (int)Math.Ceiling(tr); } // There are no big spaces in the barcode -> only small spaces thresholds[1] = Int32.MaxValue; // For start and end symbols defined empirically threshold equals to 5 if (offset == STARTEND_LENGTH) thresholds[0] = 5; int bitmask = 1 << offset; int pattern = 0; for (int i = 0; i < offset; i++) { int barOrSpace = i & 1; bitmask >>= 1; if (counters[position + i] >= thresholds[barOrSpace]) { pattern |= bitmask; } } for (int i = 0; i < CHARACTER_ENCODINGS.Length; i++) { if (CHARACTER_ENCODINGS[i] == pattern) { return i; } } return -1; } } }

Zxing Optimization

So, we managed to recognize one or more codes on the page. But our problems did not end there. Since under the conditions we may have several codes plus it is necessary to scan 4 different positions of the sheet, the algorithm has become significantly “slowed down”. I had to dig more, resulting in the following feature:
Zxing based on the image creates an instance of the class RGBLuminanceSource. It has an array of bytes containing information about the brightness of each pixel of the original image. Then, based on this information and the threshold value, a bitmap is obtained.

Here is an example of the code part of the RGBLuminanceSource class constructor:

  Color c; for (int y = 0; y < height; y++) { int offset = y * width; for (int x = 0; x < width; x++) { c = bitmap.GetPixel(x, y); var r = ColorUtility.GetRValue(c); var g = ColorUtility.GetGValue(c); var b = ColorUtility.GetBValue(c); luminances[offset + x] = (byte)(0.3 * r + 0.59 * g + 0.11 * b + 0.01); } }

That is, the cycles use a slow bitmap.GetPixel (x, y) for each pixel of the image! For small images with a resolution of 200x300 pixels (or close to that), this approach is quite appropriate and does not cause delays (considering that usually only one code is recognized). But in our case, the image has a high resolution (up to 3000 x 5000 pixels), which should also be multiplied by the number of orientation options, and multiplied by the processing of multiple pages. All this leads to unacceptable delays. For example, for one page of the above resolution, an object of class RGBLuminanceSource was created in seconds for 8. This, of course, is very long.

I had to further modify this code, forget about GetPixel and go to work on scanlins.

  bmp = bitmap.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadOnly, bitmap.PixelFormat); for (var y = 0; y < bmp.Height; y++) { var row = (byte*)bmp.Scan0 + (y * bmp.Stride); int offset = y * width; for (var x = 0; x < bmp.Width; x++) { var b = row[(x * pixelSize)]; var g = row[(x * pixelSize) + 1]; var r = row[(x * pixelSize) + 2]; luminances[offset + x] = (byte)(0.3 * r + 0.59 * g + 0.11 * b + 0.01); } }

This step greatly accelerated the algorithm and made it possible to obtain an acceptable processing time.

Work with PDF

As mentioned above, coupons can be scanned in the form of image files or in a PDF document. To turn pdf pages into images we used the library itextsharp .

The main class for working with this library is PdfReader. An instance of this class can be obtained, for example, as follows:

Look for code snippets under the spoiler.

Hidden text

 var reader = new PdfReader(filePath)

After that you can use it in the code:

 for (var pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber++) { var page = reader.GetPageN(pageNumber); List<ImageRenderInfo> images; try { images = FindImageInPDFDictionary(page); } catch (Exception) { //     PDF  continue; } finally { reader.ReleasePage(pageNumber); } foreach (var img in images) { var image = RenderImage(img); var result = ImageDecoder.Decode(image, allRotations); if (result != null && result.Count > 0) { //  ,     } } }

Using this function, we search for images on the PDF page

 private static List<ImageRenderInfo> FindImageInPDFDictionary(PdfDictionary pg) { var result = new List<ImageRenderInfo>(); var res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES)); var xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)); if (xobj == null) return null; foreach (var name in xobj.Keys) { var obj = xobj.Get(name); if (!obj.IsIndirect()) continue; var tg = (PdfDictionary)PdfReader.GetPdfObject(obj); var type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)); if (PdfName.IMAGE.Equals(type)) { var width = float.Parse(tg.Get(PdfName.WIDTH).ToString()); var height = float.Parse(tg.Get(PdfName.HEIGHT).ToString()); if (width > ImageDecoder.MinimalSideResolution || height >= ImageDecoder.MinimalSideResolution) { var imgRi = ImageRenderInfo.CreateForXObject(new Matrix(width, height), (PRIndirectReference)obj, tg); result.Add(imgRi); } } if (PdfName.FORM.Equals(type)) { result.AddRange(FindImageInPDFDictionary(tg)); } if (PdfName.GROUP.Equals(type)) { result.AddRange(FindImageInPDFDictionary(tg)); } } return result; }

Get an object of type Bitmap from an object of class ImageRenderInfo

 private static Bitmap RenderImage(ImageRenderInfo renderInfo) { try { var image = renderInfo.GetImage(); using (var dotnetImg = image.GetDrawingImage()) { if (dotnetImg != null) { using (var ms = new MemoryStream()) { dotnetImg.Save(ms, ImageFormat.Png); return new Bitmap(dotnetImg); } } } } catch (Exception) { } return null; }

The ImageDecoder.Decode method implements the logic of finding the code in the picture.

It so happened that now in the world there are many varieties of barcodes. The recognition and generation of most of them is implemented in libraries available for developers. However, sometimes you can stumble upon the original type of barcode, which is not immediately recognizable.

And then scrutinizing and using a well-designed open source library helps you get results quickly.

Source: https://habr.com/ru/post/214967/

All Articles