📜 ⬆️ ⬇️

How to cook AR on android

If you are interested in developing an AR application on Android, in particular, you want to write your “Masks”, or any other AR application, then you are here. Here I will give you a brief insight into how this can be done, I do not think that I offer the best option, it may be easier to use ready-made frameworks, but at least it’s a working one and is not difficult to implement. In general, in this case, as it seems to me, it is not necessary to use any additional weights, in the form of libraries or frameworks or anything else, since it only clutters up all things, and it’s enough to use standard tools. Here I will show how to work with the camera, how to process the image, and how to apply effects, I assume that the reader already knows how to create “Hello world” on the android. All this is under the cut.

Once you finally decided to look into the article, I will try not to disappoint, and who knows, maybe after reading this article you will have a desire to write an application that will win the hearts of users and take its rightful place in the best applications.

But I want to warn you right away that I will not post all the finished sources, I will give only the torn pieces, since A good article should be like a miniskirt - short enough to attract interest and long enough to cover all the most important.

I had to break the article into several parts, in the first we should get a working product without effects, then, of course, if the article is like, I’ll tell you about the examples of the application of algorithms and effects. The plan of the current article is to get an image from the camera, display the image via GLSurfaceView.
')

Obtaining images from the camera


And so ... Let's start with the fact that we need to receive a stream of images from the camera. Android provides us with the Camera API, a description of which you can easily find on the official website .
Unfortunately, I will show you the way of working with the old android.hardware.camera package, and my newest android.hardware.camera2 did not reach my hands, but I think it’s a minor problem to redo it.

First, we give permission to work with the camera:

<uses-permission android:name="android.permission.CAMERA" /> 

Add to our layout preview for the camera

  <trolleg.CameraView android:id="@+id/fd_fase_surface_view" android:layout_width="match_parent" android:layout_height="match_parent" android:alpha="0"/> 

Please note that we hide this element through alpha = 0, since The preview will show only the stream from the camera in its original form, and we need to impose effects on it, but we can’t even use camera preview for the camera, sort of due to security reasons, so as not to make hidden video recordings.

Next, we need to start the camera, comments in the code below, pieces of code were torn out of the project and slightly edited, so we’ll have to refactor a little to get started.

 public class CameraView extends SurfaceView implements SurfaceHolder.Callback, Camera.PreviewCallback { public FrameCamera frameCamera = new FrameCamera(); //          ,        private static final int MAGIC_TEXTURE_ID = 10; boolean cameraFacing; private byte mBuffer[]; private static final String TAG = "CameraView"; private Camera mCamera; private SurfaceTexture mSurfaceTexture; int numberOfCameras; int cameraIndex; int previewWidth; int previewHeight; int cameraWidth; int cameraHeight; public CameraView(Context context, AttributeSet attrs) { super(context, attrs); cameraIndex = 0; //      numberOfCameras = android.hardware.Camera.getNumberOfCameras(); android.hardware.Camera.CameraInfo cameraInfo = new android.hardware.Camera.CameraInfo(); //      for (int i = 0; i < numberOfCameras; i++) { android.hardware.Camera.getCameraInfo(i, cameraInfo); if (cameraInfo.facing == android.hardware.Camera.CameraInfo.CAMERA_FACING_FRONT) { cameraIndex = i; } } getHolder().addCallback(this); } @Override public void surfaceCreated(SurfaceHolder holder) { } @Override public void surfaceChanged(SurfaceHolder holder, int format, int w, int h) { previewHeight = h; previewWidth = w; startCameraPreview(w , h); } private void startCameraPreview(int previewWidthLocal, int previewHeightLocal) { releaseCamera(); mCamera = Camera.open(cameraIndex); Camera.Parameters params = mCamera.getParameters(); params.setPreviewFormat(ImageFormat.NV21); //    NV21,      //           ... mCamera.setParameters(params); //     - int size = cameraWidth * cameraHeight; size = size * ImageFormat.getBitsPerPixel(params.getPreviewFormat()) / 8; mBuffer = new byte[size]; try { //       mCamera.addCallbackBuffer(mBuffer); mCamera.setPreviewCallbackWithBuffer(this); //    mCamera.setPreviewDisplay(null); mCamera.startPreview(); } catch (Exception e){ Log.d(TAG, "Error starting camera preview: " + e.getMessage()); } } @Override public void surfaceDestroyed(SurfaceHolder surfaceHolder) { releaseCamera(); } @Override public void onPreviewFrame(byte[] data, Camera camera) { synchronized (frameCamera) { //        frameCamera.cameraWidth = cameraWidth; frameCamera.cameraHeight = cameraHeight; frameCamera.facing = cameraFacing; if (frameCamera.bufferFromCamera == null || frameCamera.bufferFromCamera.length != data.length) { frameCamera.bufferFromCamera = new byte[data.length]; } System.arraycopy(data, 0, frameCamera.bufferFromCamera, 0, data.length); frameCamera.wereProcessed = false; } //         mCamera.addCallbackBuffer(mBuffer); } public void disableView() { releaseCamera(); } public void enableView() { startCameraPreview(previewWidth, previewHeight); } } 

The camera will start with the launch of our created View.

Display image via GLSurfaceView


Next we start OpenGL, about which you can read on wikipedia . In short, this is a standard for working with graphics through shaders. Shaders are programs that run in parallel on the GPU cores; we will use vertex and fragment shaders. The vertex shader is usually used to transform 3d coordinates onto a plane, and the fragment shader calculates the color of each pixel on the plane by approximating the coordinate on the plane to the texture coordinate of the projected object. Enough of boring theory, then a bit of code. Add the corresponding element to our layout:

 <android.opengl.GLSurfaceView android:id="@+id/fd_glsurface" android:layout_width="match_parent" android:layout_height="match_parent" /> 

Make the necessary settings

 GLSurfaceView gLSurfaceView = findViewById(R.id.fd_glsurface); gLSurfaceView.setEGLContextClientVersion(2); gLSurfaceView.setEGLConfigChooser(8, 8, 8, 8, 16, 0); gLSurfaceView.getHolder().setFormat(PixelFormat.TRANSPARENT); gLSurfaceView.setRenderer(new OurRenderer()); //      gLSurfaceView.setRenderMode(GLSurfaceView.RENDERMODE_CONTINUOUSLY); 

Next, create a renderer class, it will spin the entire logic of creating an image.

 public class OurRenderer implements GLSurfaceView.Renderer { int programNv21ToRgba; //   nv21  rgba,      int texNV21FromCamera[] = new int[2]; // id- ,  \  UV //         ByteBuffer bufferY; ByteBuffer bufferUV; private void initShaders() { int vertexShaderId = ShaderUtils.createShader(GLES20.GL_VERTEX_SHADER, FileUtils.getStringFromAsset(context.getAssets(), "shaders/vss_2d.glsl")); int fragmentShaderId = ShaderUtils.createShader(GLES20.GL_FRAGMENT_SHADER, FileUtils.getStringFromAsset(context.getAssets(), "shaders/fss_n21_to_rgba.glsl")); programNv21ToRgba = ShaderUtils.createProgram(vertexShaderId, fragmentShaderId); } public void onSurfaceCreated(GL10 gl, EGLConfig config) { initShaders(); //     GLES20.glGenTextures(2, texNV21FromCamera, 0); GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, texNV21FromCamera[0]); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_NEAREST); GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, texNV21FromCamera[1]); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_S, GLES20.GL_CLAMP_TO_EDGE); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_WRAP_T, GLES20.GL_CLAMP_TO_EDGE); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MIN_FILTER, GLES20.GL_NEAREST); GLES20.glTexParameteri(GLES20.GL_TEXTURE_2D, GLES20.GL_TEXTURE_MAG_FILTER, GLES20.GL_NEAREST); } //            GLSurfaceView.RENDERMODE_CONTINUOUSLY public void onDrawFrame(GL10 gl) { //      ,        synchronized (frameCamera) { mCameraWidth = frameCamera.cameraWidth; mCameraHeight = frameCamera.cameraHeight; int cameraSize = mCameraWidth * mCameraHeight; if (bufferY == null) { bufferY = ByteBuffer.allocateDirect(cameraSize); bufferUV = ByteBuffer.allocateDirect(cameraSize / 2); } //     nv21    : \  UV bufferY.put(frameCamera.bufferFromCamera, 0, cameraSize); bufferY.position(0); bufferUV.put(frameCamera.bufferFromCamera, cameraSize, cameraSize / 2); bufferUV.position(0); //  \ GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, texNV21FromCamera[0]); GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_LUMINANCE, mCameraWidth, (int) (mCameraHeight), 0, GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE, bufferY); GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE, bufferY); GLES20.glFlush(); //  UV    \+ALPHA,      ,       ,       GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, texNV21FromCamera[1]); GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_LUMINANCE_ALPHA, mCameraWidth / 2, (int) (mCameraHeight * 0.5), 0, GLES20.GL_LUMINANCE_ALPHA, GLES20.GL_UNSIGNED_BYTE, bufferUV); GLES20.GL_LUMINANCE_ALPHA, GLES20.GL_UNSIGNED_BYTE, bufferUV); GLES20.glFlush(); } //        nv21  rgba GLES20.glBindFramebuffer(GLES20.GL_FRAMEBUFFER, 0); // buffer 0 -     GlSurfaceView,     ,      GLES20.glViewport(0, 0, widthSurf, heightSurf); GLES20.glUseProgram(programNv21ToRgba); //     int vPos = GLES20.glGetAttribLocation(programNv21ToRgba, "vPosition"); int vTex = GLES20.glGetAttribLocation(programNv21ToRgba, "vTexCoord"); GLES20.glEnableVertexAttribArray(vPos); GLES20.glEnableVertexAttribArray(vTex); int ufacing = GLES20.glGetUniformLocation(programNv21ToRgba, "u_facing"); GLES20.glUniform1i(ufacing, facing1 ? 1 : 0); GLES20.glUniform1f(GLES20.glGetUniformLocation(programNv21ToRgba, "cameraWidth"), mCameraWidth); GLES20.glUniform1f(GLES20.glGetUniformLocation(programNv21ToRgba, "cameraHeight"), mCameraHeight); GLES20.glUniform1f(GLES20.glGetUniformLocation(programNv21ToRgba, "previewWidth"), widthSurf); GLES20.glUniform1f(GLES20.glGetUniformLocation(programNv21ToRgba, "previewHeight"), heightSurf); ShaderEffectHelper.shaderEffect2dWholeScreen(new Point(0, 0), new Point(widthSurf, heightSurf), texNV21FromCamera[0], programNv21ToRgba, vPos, vTex, texNV21FromCamera[1]); } 

Here is our auxiliary method of calling a shader, it divides the texture into two triangles and applies a linear affine transformation to them, if in a simple way.

 public class ShaderEffectHelper { ... public static void shaderEffect2dWholeScreen(Point center, Point center2, int texIn, int programId, int poss, int texx, Integer texIn2) { GLES20.glUseProgram(programId); int uColorLocation = GLES20.glGetUniformLocation(programId, "u_Color"); GLES20.glUniform4f(uColorLocation, 0.0f, 0.0f, 1.0f, 1.0f); int uCenter = GLES20.glGetUniformLocation(programId, "uCenter"); GLES20.glUniform2f(uCenter, (float)center.x, (float)center.y); int uCenter2 = GLES20.glGetUniformLocation(programId, "uCenter2"); GLES20.glUniform2f(uCenter2, (float)center2.x, (float)center2.y); //        FloatBuffer vertexData = convertArray(new float[]{ -1, -1, -1, 1, 1, -1, 1, 1 }); //        FloatBuffer texData = convertArray(new float[] { 0, 0, 0, 1, 1, 0, 1, 1 }); GLES20.glVertexAttribPointer(poss, 2, GLES20.GL_FLOAT, false, 0, vertexData); GLES20.glVertexAttribPointer(texx, 2, GLES20.GL_FLOAT, false, 0, texData); GLES20.glActiveTexture(GLES20.GL_TEXTURE0); GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, texIn); GLES20.glUniform1i(GLES20.glGetUniformLocation(programId, "sTexture"), 0); if (texIn2 != null) { GLES20.glActiveTexture(GLES20.GL_TEXTURE1); GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, texIn2); GLES20.glUniform1i(GLES20.glGetUniformLocation(programId, "sTexture2"), 1); } GLES20.glDrawArrays(GLES20.GL_TRIANGLE_STRIP, 0, 4); // ,    4       ,         GLES20.glFlush(); } ... 

You can read about the NV21 format here . In short, the image format of the NV21 is a matrix of pixels of a b \ b image, followed by a difference of colors Y and U for every four pixels of a b \ b of an image, and the total is 4 h of a byte and 2 bytes of a difference of colors, we reduce the fraction 2 to 1, i.e. NV21.

Here are our shaders to translate nv21 into rgba. Vertex shader (vss_2d.glsl):

 attribute vec2 vPosition; attribute vec2 vTexCoord; varying vec2 texCoord; uniform mat4 uMVP; // for 2d triangles varying vec2 v_TexCoordinate; varying vec2 v_TexOrigCoordinate; // simple coomon 2d shader void main() { texCoord = vTexCoord; v_TexCoordinate = vTexCoord; v_TexOrigCoordinate = vec2(vPosition.x / 2.0 + 0.5, vPosition.y / 2.0 + 0.5); gl_Position = vec4 ( vPosition.x, vPosition.y, 0.0, 1.0 ); } 

And the fragment shader (fss_n21_to_rgba.glsl), it centers, changes scale and transforms color

 precision mediump float; uniform sampler2D sTexture; // y - texture uniform sampler2D sTexture2; //uv texture varying vec2 texCoord; uniform int u_facing; uniform float cameraWidth; uniform float cameraHeight; // remember, camera is rotated 90 degree uniform float previewWidth; uniform float previewHeight; const mat3 yuv2rgb = mat3( 1, 0, 1.2802, 1, -0.214821, -0.380589, 1, 2.127982, 0 ); // shader from convert NV21 to RGBA void main() { vec2 coord = vec2(texCoord.y, texCoord.x); if (u_facing == 0) coord.x = 1.0 - coord.x; // centered pic by maximum size coord.y = 1.0 - coord.y; if (previewWidth / previewHeight > cameraHeight / cameraWidth) { coord.x = 0.5 - (0.5 - coord.x) * previewHeight * (cameraHeight / previewWidth) / cameraWidth;// (cameraHeight / cameraWidth) * (previewWidth / previewHeight); } else if (previewWidth / previewHeight < cameraHeight / cameraWidth) { coord.y = 0.5 - (0.5 - coord.y) * previewWidth * (cameraWidth / previewHeight) / cameraHeight; } float y = texture2D(sTexture, coord).r; float u = texture2D(sTexture2, coord).a; float v = texture2D(sTexture2, coord).r; vec4 color; // another way sligthly lighter // TODO find correct way of transfromation color.r = (1.164 * (y - 0.0625)) + (1.596 * (v - 0.5)); color.g = (1.164 * (y - 0.0625)) - (0.391 * (u - 0.5)) - (0.813 * (v - 0.5)); color.b = (1.164 * (y - 0.0625)) + (2.018 * (u - 0.5)); color.a = 1.0; vec3 yuv = vec3( 1.1643 * y - 0.0627, u - 0.5, v - 0.5 ); vec3 rgb = yuv * yuv2rgb; color = vec4(rgb, 1.0); gl_FragColor = color; } 

It seems to me that the source is too much, I suggest to stop at this place.

Finally

Congratulations, you honestly got to the end or quickly scrolled down, nevertheless, let's summarize some results. If everything is put together, it turns out that we have learned how to start the camera, get an image, transform it into a shader texture and show it on the screen of the phone. In the next article, if you like this one, I will write how to twist the algorithm, for example, search for a face on a frame using OpenCV and impose the simplest effect.

upd.
This study is supported by the Foundation for Assistance to the Development of Small Enterprises in the Scientific and Technical Sphere under the program “UMNIK” on the topic “Developing a computer program for imposing objects of augmented reality ...” as part of agreement No. 10858GU2016 dated December 29, 2016

Source: https://habr.com/ru/post/347140/


All Articles