VNRecognizeTextRequest
. In essence, this is a description of what we hope to recognize, plus setting up a recognition language and level of accuracy: let request = VNRecognizeTextRequest(completionHandler: self.handleDetectedText) request.recognitionLevel = .accurate request.recognitionLanguages = ["en_GB"]
handleDetectedText(request: VNRequest?, error: Error?)
. We pass it to the VNRecognizeTextRequest
constructor and then set the remaining properties..fast
and .accurate
. Since our card has a rather small text at the bottom, I chose a higher accuracy. The fast version is more likely to be better for large amounts of text.customWords
: you can add an array of strings to be used over the built-in lexicon. This is useful if there are any unusual words in your text. I did not apply an option for this project. But if I made a commercial recognition app for Magic The Gathering cards, I would add some of the most complex cards (for example, Fblthp, the Lost ) to avoid problems.minimumTextHeight
: this is the float value. It indicates the size relative to the height of the image at which the text should no longer be recognized. If I created this scanner to just get the name of the card, it would be useful to remove all other text that is not needed. But I need the smallest pieces of text, so for now I have ignored this property. Obviously, when ignoring small texts, the recognition rate will be higher. let requests = [textDetectionRequest] let imageRequestHandler = VNImageRequestHandler(cgImage: cgImage, orientation: .right, options: [:]) DispatchQueue.global(qos: .userInitiated).async { do { try imageRequestHandler.perform(requests) } catch let error { print("Error: \(error)") } }
UIImage
to a CGImage
. This is used in the VNImageRequestHandler
along with the orientation flag to help the processor understand which text it should recognize..right
orientation. So, the layouts!.userInitiated
and try to execute our queries. You may notice that this is an array of queries. This happens because you can try to pull out several pieces of data in one pass (that is, identify faces and text from the same image). If there are no errors, the callback created using our query will be called after the text is found: func handleDetectedText(request: VNRequest?, error: Error?) { if let error = error { print("ERROR: \(error)") return } guard let results = request?.results, results.count > 0 else { print("No text found") return } for result in results { if let observation = result as? VNRecognizedTextObservation { for text in observation.topCandidates(1) { print(text.string) print(text.confidence) print(observation.boundingBox) print("\n") } } } }
VNRecognizedTextObservation
, which for us has several options for the result (hereinafter - candidates).observation.topCandidates(1)
and extract both text and confidence. While the candidate himself has different text and confidence, the .boundingBox
remains the same. .boundingBox
uses a normalized coordinate system with the origin in the lower left corner, so if it is to be used later in UIKit, for your own convenience it should be converted. Carnage Tyrant 1.0 (0.2654155572255453, 0.6955686092376709, 0.18710780143737793, 0.019915008544921786) Creature 1.0 (0.26317582130432127, 0.423814058303833, 0.09479101498921716, 0.013565015792846635) Dinosaur 1.0 (0.3883238156636556, 0.42648010253906254, 0.10021591186523438, 0.014479541778564364) Carnage Tyrant can't be countered. 1.0 (0.26538230578104655, 0.3742666244506836, 0.4300231456756592, 0.024643898010253906) Trample, hexproof 0.5 (0.2610074838002523, 0.34864263534545903, 0.23053167661031088, 0.022259855270385653) Sun Empire commanders are well versed 1.0 (0.2619712670644124, 0.31746063232421873, 0.45549616813659666, 0.022649812698364302) in advanced martial strategy. Still, the 1.0 (0.2623249689737956, 0.29798884391784664, 0.4314465204874674, 0.021180248260498136) correct maneuver is usually to deploy the 1.0 (0.2620727062225342, 0.2772137641906738, 0.4592740217844645, 0.02083740234375009) giant, implacable death lizard. 1.0 (0.2610833962758382, 0.252408218383789, 0.3502468903859457, 0.023736238479614258) 7/6 0.5 (0.6693102518717448, 0.23347826004028316, 0.04697717030843107, 0.018937730789184593) 179/279 M 1.0 (0.24829587936401368, 0.21893787384033203, 0.08339192072550453, 0.011646795272827193) XLN: EN N YEONG-HAO HAN 0.5 (0.246867307027181, 0.20903720855712893, 0.19095951716105145, 0.012227916717529319) TN & 0 2017 Wizards of the Coast 1.0 (0.5428387324015299, 0.21133480072021482, 0.19361832936604817, 0.011657810211181618)
This is incredible! Each piece of text was recognized, placed in its own bounding box and returned as a result with a trust rating of 1.0.
.boundingBox
determine a location so that I can select the text in the lower left corner and in the upper left corner, ignoring anything further to the right.The end result was an application scanning the card and returning the result to me in less than one second.
Source: https://habr.com/ru/post/459668/
All Articles