Hi, Habr! In one of our past articles, the issue of embedding the Smart IDReader recognition core in an iOS application was studied. It's time to discuss the same problem, but for the Android OS. Due to the large number of versions of the system and a wide fleet of devices, it will be more complicated than for iOS, but still quite a solvable task. Disclaimer - the information below is not true in the last resort, if you know how to simplify the process of embedding / working with the camera or do it differently - welcome to the comments!
Suppose we want to add document recognition functionality to our application, and for this we have the Smart IDReader SDK, which consists of the following parts:
bin
- build the libjniSmartIdEngine.so
kernel libjniSmartIdEngine.so
for 32-bit ARMv7 architecturebin-64
- build libjniSmartIdEngine.so
kernel libjniSmartIdEngine.so
for 64-bit ARMv8 architecturebin-x86
- build the libjniSmartIdEngine.so
kernel libjniSmartIdEngine.so
for 32-bit x86 architecturebindings
- JNI jniSmartIdEngineJar.jar
wrapper over libjniSmartIdEngine.so
librarydata
- kernel configuration filesdoc
- SDK documentationSome comments on the content of the SDK.
The presence of three library assemblies for different platforms is a fee for a wide variety of devices on the Android OS (we do not build for MIPS because there are no devices of this architecture). The assemblies for ARMv7 and ARMv8 are basic, the x86 version is usually used by our clients for specific devices based on mobile Intel processors.
JNI (Java Native Interface) jniSmartIdEngineJar.jar
is required to call C ++ code from Java applications. The build of the wrapper is automated using the SWIG toolkit (simplified wrapper and interface generator) .
So, as the French say, revenons à nos moutons! We have an SDK and we need to build it into the project with minimal effort and start using it. This will require the following steps:
In order for everyone to play around with the library, we prepared and posted the source code of Smart IDReader Demo for Android on Github . The project is made for Android Studio and shows an example of working with the camera and the core based on a simple application.
Consider this process on the example of the application project under Android Studio, for users of other IDE process is not particularly different. By default, in every project, Android Studio creates a libs
folder, from which the Gradle collector picks up and adds JAR files to the project. This is where we copy the JNI jniSmartIdEngineJar.jar
wrapper. There are several ways to add kernel libraries, the easiest way to do this is using a JAR archive. Create an archive in the libs
folder with the name native-libs.jar
(this is important!) And inside the archive subfolders lib/armeabi-v7a
and lib/arm64-v8a
and place the corresponding versions of libraries there (for x86 libraries, the subfolder is lib/x86
).
In this case, the Android OS after installing the application will automatically deploy the required version of the library for this device. The accompanying engine configuration files are added to the project’s assets folder; if this folder is missing, you can create it manually or by using the File|New|Folder|Assets Folder
command. As you can see, adding files to the project is very simple and takes very little time.
So, we added the necessary files to the application and even successfully collected it. Hands and stretch to try a new functionality in, but for this you need a little more work :-) Namely, do the following:
In order for the library to access the configuration files, it is necessary to transfer them from assets to the working folder of the application. It is enough to do this once at startup and then update only when a new version is released. The easiest way to do such a check, based on the version of the application code, and if it has changed then update the files.
// PackageInfo packageInfo = getPackageManager().getPackageInfo(getPackageName(), 0); int version_code = packageInfo.versionCode; SharedPreferences sPref = PreferenceManager.getDefaultSharedPreferences(this); // int version_current = sPref.getInt("version_code", -1); // need_copy_assets = version_code != version_current; // SharedPreferences.Editor ed = sPref.edit(); ed.putInt("version_code", version_code); ed.commit(); … if (need_copy_assets == true) copyAssets();
The copying procedure itself is not complicated and consists in retrieving data from files located in the assets of the application and writing this data to the files in the working directory. An example of the code of the function that performs this copying can be seen in the example on Github .
It remains only to load the library and initialize the kernel. The whole procedure takes a certain time, so it is reasonable to perform it in a separate thread, so as not to slow down the main GUI thread. AsyncTask based initialization example
private static RecognitionEngine engine; private static SessionSettings sessionSettings; private static RecognitionSession session; ... lass InitCore extends AsyncTask<Void, Void, Void> { @Override protected Void doInBackground(Void... unused) { if (need_copy_assets) copyAssets(); // configureEngine(); return null; } @Override protected void onPostExecute(Void aVoid) { super.onPostExecute(aVoid); if(is_configured) { // (, rus.passport.* ) sessionSettings.AddEnabledDocumentTypes(document_mask); // StringVector document_types = sessionSettings.GetEnabledDocumentTypes(); ... } } } … private void configureEngine() { try { // System.loadLibrary("jniSmartIdEngine"); // String bundle_path = getFilesDir().getAbsolutePath() + File.separator + bundle_name; // engine = new RecognitionEngine(bundle_path); // sessionSettings = engine.CreateSessionSettings(); is_configured = true; } catch (RuntimeException e) { ... } catch(UnsatisfiedLinkError e) { ... } }
If your application already uses the camera, you can safely skip this section and go to the next one. For the rest, let's consider the issue of using the camera for working with video stream for document recognition by means of Smart IDReader. Immediately make a reservation that we use the class Camera, and not Camera2, although it is declared as deprecated since API version 21 (Android 5.0). This is done deliberately for the following reasons:
To add camera support to the application, you need to register the following lines in the manifest:
<uses-permission android:name="android.permission.CAMERA" /> <uses-feature android:name="android.hardware.camera" />
A good tone is to request permission to use the camera, implemented in Android 6.x and higher. In addition, users of these systems can always take away the permissions from the application in the settings, so you still need to check.
// - if( needPermission(Manifest.permission.CAMERA) == true ) requestPermission(Manifest.permission.CAMERA, REQUEST_CAMERA); … public boolean needPermission(String permission) { // int result = ContextCompat.checkSelfPermission(this, permission); return result != PackageManager.PERMISSION_GRANTED; } public void requestPermission(String permission, int request_code) { // ActivityCompat.requestPermissions(this, new String[]{permission}, request_code); } @Override public void onRequestPermissionsResult(int requestCode, String permissions[], int[] grantResults) { switch (requestCode) { case REQUEST_CAMERA: { // boolean is_granted = false; for(int grantResult : grantResults) { if(grantResult == PackageManager.PERMISSION_GRANTED) // is_granted = true; } if (is_granted == true) { camera = Camera.open(); // .... } else toast("Enable CAMERA permission in Settings"); } default: super.onRequestPermissionsResult(requestCode, permissions, grantResults); } }
An important part of working with the camera is setting its parameters, namely the focus mode and the resolution of the preview. Due to the wide variety of devices and the characteristics of their cameras, this issue should be given special attention. If the camera does not support focusing, then you have to work with a fixed focus or directed to infinity. In this case, especially nothing can be done, we receive images from the camera as is. And if we are lucky and the focus is available, then we check whether the FOCUS_MODE_CONTINUOUS_PICTURE
or FOCUS_MODE_CONTINUOUS_VIDEO
modes FOCUS_MODE_CONTINUOUS_PICTURE
FOCUS_MODE_CONTINUOUS_VIDEO
, which means the constant process of focusing on the FOCUS_MODE_CONTINUOUS_VIDEO
in the process. If these modes are supported, then we set them in the parameters. If not, then you can make the next trick - start the timer and call the focus function on the camera with a specified frequency.
Camera.Parameters params = camera.getParameters(); // List<String> focus_modes = params.getSupportedFocusModes(); String focus_mode = Camera.Parameters.FOCUS_MODE_AUTO; boolean isAutoFocus = focus_modes.contains(focus_mode); if (isAutoFocus) { if (focus_modes.contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE)) focus_mode = Camera.Parameters.FOCUS_MODE_CONTINUOUS_PICTURE; else if (focus_modes.contains(Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO)) focus_mode = Camera.Parameters.FOCUS_MODE_CONTINUOUS_VIDEO; } else { // focus_mode = focus_modes.get(0); } // params.setFocusMode(focus_mode); // if (focus_mode == Camera.Parameters.FOCUS_MODE_AUTO) { timer = new Timer(); timer.schedule(new Focus(), timer_delay, timer_period); } … // private class Focus extends TimerTask { public void run() { focusing(); } } public void focusing() { try{ Camera.Parameters cparams = camera.getParameters(); // if( cparams.getMaxNumFocusAreas() > 0) { camera.cancelAutoFocus(); cparams.setFocusMode(Camera.Parameters.FOCUS_MODE_AUTO); camera.setParameters(cparams); } }catch(RuntimeException e) { ... } }
Setting the preview resolution is quite simple, the basic requirements are for the camera's aspect ratio to match the sides of the display area for no distortion when viewing, and it is desirable that the resolution be as high as possible, since the quality of document recognition depends on it. In our example, the application displays the preview on full screen, so we choose the maximum resolution corresponding to the aspect ratio of the screen.
DisplayMetrics metrics = new DisplayMetrics(); getWindowManager().getDefaultDisplay().getMetrics(metrics); // float best_ratio = (float)metrics.heightPixels / (float)metrics.widthPixels; List<Camera.Size> sizes = params.getSupportedPreviewSizes(); Camera.Size preview_size = sizes.get(0); // final float tolerance = 0.1f; float preview_ratio_diff = Math.abs( (float) preview_size.width / (float) preview_size.height - best_ratio); // preview for (int i = 1; i < sizes.size() ; i++) { Camera.Size tmp_size = sizes.get(i); float tmp_ratio_diff = Math.abs( (float) tmp_size.width / (float) tmp_size.height - best_ratio); if( Math.abs(tmp_ratio_diff - preview_ratio_diff) < tolerance && tmp_size.width > preview_size.width || tmp_ratio_diff < preview_ratio_diff) { preview_size = tmp_size; preview_ratio_diff = tmp_ratio_diff; } } // preview params.setPreviewSize(preview_size.width, preview_size.height);
It remains quite a bit - to set the camera orientation and display preview on the surface of the Activity. By default, the angle of 0 degrees corresponds to the landscape orientation of the device; when the screen rotates, it must be changed accordingly. Here you can still remember the good word Nexus 5X from Google, the matrix of which is installed in the device upside down and for which a separate orientation test is needed.
private boolean is_nexus_5x = Build.MODEL.contains("Nexus 5X"); SurfaceView surface = (SurfaceView) findViewById(R.id.preview); ... // camera.setDisplayOrientation(!is_nexus_5x ? 90: 270); // preview camera.setPreviewDisplay(surface.getHolder()); // preview camera.startPreview();
So, the camera is connected and working, the only thing left is to use the core and get the result. We start the recognition process by starting a new session and setting a callback to receive frames from the camera in the preview mode.
void start_session() { if (is_configured == true && camera_ready == true) { // , - sessionSettings.SetOption("common.sessionTimeout", "5.0"); // session = engine.SpawnSession(sessionSettings); try { session_working = true; // frame_waiting = new Semaphore(1, true); frame_ready = new Semaphore(0, true); // AsyncTask new EngineTask().execute(); } catch (RuntimeException e) { ... } // callback camera.setPreviewCallback(this); } }
The onPreviewFrame()
function receives the current image from the camera as an array of YUV NV21 format bytes. Since it can only be called in the main thread, in order not to slow down the kernel calls for image processing, they are placed in a separate thread using AsyncTask, the process is synchronized using semaphores. After receiving the image from the camera, we give a signal to the workflow to start its processing, and when it is finished - a signal to receive a new image.
// private static volatile byte[] data; ... @Override public void onPreviewFrame(byte[] data_, Camera camera) { if(frame_waiting.tryAcquire() && session_working) { data = data_; // frame_ready.release(); } } … class EngineTask extends AsyncTask<Void, RecognitionResult, Void> { @Override protected Void doInBackground(Void... unused) { while (true) { try { frame_ready.acquire(); // if(session_working == false) // break; Camera.Size size = camera.getParameters().getPreviewSize(); // RecognitionResult result = session.ProcessYUVSnapshot(data, size.width, size.height, !is_nexus_5x ? ImageOrientation.Portrait : ImageOrientation.InvertedPortrait); ... // frame_waiting.release(); }catch(RuntimeException e) { ... } catch(InterruptedException e) { ... } } return null; }
After processing each image, the kernel returns the current recognition result. It includes the document areas found, text fields with values and confidence flags, as well as graphic fields, such as photographs or captions. If the data is recognized correctly or a timeout has occurred, the IsTerminal flag is set, signaling the completion of the process. For intermediate results, you can draw the found zones and fields, show the current progress in recognition quality and much more, it all depends on your imagination.
void show_result(RecognitionResult result) { // StringVector texts = result.GetStringFieldNames(); // , , StringVector images = result.GetImageFieldNames(); for (int i = 0; i < texts.size(); i++) // { StringField field = result.GetStringField(texts.get(i)); String value = field.GetUtf8Value(); // boolean is_accepted = field.IsAccepted(); .. ... } for (int i = 0; i < images.size(); i++) // { ImageField field = result.GetImageField(images.get(i)); Bitmap image = getBitmap(field.GetValue()); // Bitmap ... } ... }
After that, we can only stop the process of obtaining images from the camera and stop the process of recognition.
void stop_session() { session_working = false; data = null; frame_waiting.release(); frame_ready.release(); camera.setPreviewCallback(null); // ... }
As you can see in our example, the process of connecting the Smart IDReader SDK to Android applications and working with the camera are not difficult, just follow some rules. A number of our customers successfully apply our technologies in their mobile applications, and the process of adding new functionality takes very little time. We hope with the help of this article and you could be convinced of it!
PS To see how Smart IDReader looks like in our performance after embedding, you can download free full versions of applications from the App Store and Google Play .
Source: https://habr.com/ru/post/332670/
All Articles