We do ourselves a remote-desktop client for a smartphone. Part 1: Server

I always wanted to have a portable remote desktop on my phone, so that, for example, when someone was knocking on ICQ, and I went out to the balcony to smoke, I could look at the phone, who was there, without leaving the balcony. Well, or, for example, switch the track, taking a bath. Yes, I know that all sorts of VNC clients have already been written, but I decided to make such a program myself.

In the first part of the article, I will confine myself to creating a simple remote desktop application, in which both the server and the client will work on ordinary desktop computers. In the second and third part, I will consider image compression and programming of the phone itself.

I imagine the final functionality in the following way: a program residently hangs on the detope (server), which, on the incoming UDP packet, begins to send fragments of the image to the return address. On the phone (client) displays sent fragments. The user can move the display window or click inside it. Information about shifts and clicks is transmitted to the server in the same way - via UDP.

I apologize in advance for what I am writing in C #, as if it were Javascript - firstly, in the article I want to get by with short listings, secondly, the program is really simple and dilute complex data structures here why, well, thirdly, C # is no longer a pure-OOP language.
')
Due to the small complexity of the program, I chose the simplest “on the knee” development process known to me - consecutive simple complications.

Let's start with the simplest program that just shows us a fragment of our own desktop, in motion:

using System; using System.Collections.Generic; using System.Windows.Forms; using System.Drawing; namespace rd2 { class Program { static void Main(string[] args) { Form f = new Form(); var timer = new System.Windows.Forms.Timer() { Interval = 40 }; timer.Tick += (s, e) => { Graphics g = f.CreateGraphics(); g.CopyFromScreen(0, 0, 0, 0, f.Size); g.Dispose(); }; timer.Start(); Application.Run(f); } } }

Everything is simple: create a window, create a timer, based on the events from which the image will be captured and copied to the window, launch it. Make sure everything works.

everything is working

Now add the drag and drop of the desktop inside the window. Immediately after creating the form, insert:

 Point window_topleft = new Point(); Size mouse_prev_loc = new Size(); bool mouse_lbdown = false; f.MouseDown += (s,e) => { mouse_lbdown = true; }; f.MouseUp += (s, e) => { mouse_lbdown = false; }; f.MouseMove += (s, e) => { if (mouse_lbdown) window_topleft += mouse_prev_loc - (Size)(e.Location); mouse_prev_loc = (Size)e.Location; };

The variable window_topleft is the coordinate of the upper left corner of the area that is displayed in the window. Fix CopyFromScreen:

 g.CopyFromScreen(window_topleft.X, window_topleft.Y, 0, 0, f.Size);

Fine! It is dragged.

Now we add the processing of clicks of the left mouse button, so that the click inside the window is translated into a click on what is displayed in this window. In order to distinguish dragging from a click, I will memorize the coordinates at which the mouse button was pressed, and if the mouse was not moved too far by pressing, I would generate a mouse click instead of dragging. Like this:

 Point mouse_down_loc = new Point(); f.MouseDown += (s, e) => { mouse_lbdown = true; mouse_down_loc = e.Location; }; f.MouseUp += (s, e) => { mouse_lbdown = false; if( Math.Abs(e.Location.X - mouse_down_loc.X) <1 && Math.Abs(e.Location.Y - mouse_down_loc.Y) <1) { int click_to_x = (window_topleft.X + mouse_down_loc.X) * 65536 / Screen.PrimaryScreen.Bounds.Width; int click_to_y = (window_topleft.Y + mouse_down_loc.Y) * 65536 / Screen.PrimaryScreen.Bounds.Height; mouse_event((uint)(MOUSEEVENTF_LEFTDOWN | MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_MOVE), (uint)click_to_x, (uint)click_to_y, 0, 0); mouse_event((uint)(MOUSEEVENTF_LEFTUP | MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_MOVE), (uint)click_to_x, (uint)click_to_y, 0, 0); } };

A mouse click is obtained from two consecutive calls to the WinAPI function mouse_event. The first call is a button click (MOUSEEVENTF_LEFTDOWN), the second is a release (MOUSEEVENTF_LEFTUP). Together with the push of a button, we transfer the movement (MOUSEEVENTF_MOVE) of the mouse to the desired coordinates, which are indicated by the absolute value (MOUSEEVENTF_ABSOLUTE). Zero absolute mouse coordinates are located in the upper left corner of the primary screen (PrimaryScreen). The point (65535, 65535) is located in the lower right corner of the same screen. All other screens, if they are in the system, are adjacent to this square.

Well, of course, you need to export yourself a mouse_event. This declaration is located in the class declaration:

 [DllImport("user32.dll", CharSet = CharSet.Auto, CallingConvention = CallingConvention.StdCall)] public static extern void mouse_event(uint dwFlags, uint dx, uint dy, uint cButtons, uint dwExtraInfo); private const int MOUSEEVENTF_MOVE = 0x01; private const int MOUSEEVENTF_LEFTDOWN = 0x02; private const int MOUSEEVENTF_LEFTUP = 0x04; private const int MOUSEEVENTF_ABSOLUTE = 0x8000;

What you should think about before moving on

UDP loses, delays and reorders packets. In the task being solved, re-sending the packet is meaningless: by the time we determine that the packet has been lost, it is already time to refresh the screen again. That is why I chose UDP, not TCP. It is impossible to fight losses, but you need to adapt to them: the protocol should not have a long-lived state, and packet losses should not be fatal or spoil the picture for a long time.

Transferring a fragment of a screen the size of a smartphone's screen, even at a frequency of 10 hertz, is 3 * 800 * 480 * 10 = 11520000 bytes per second. This is almost 100 megabits. Without compression can not do.
There is no need to re-send parts of the screen that have not changed - there can be quite a lot of them. But you can’t completely refuse to send unchanged parts - we have an unreliable channel, and, in fact, we do not know what is displayed on the client.

Window size may vary. For example, due to the rotation of the phone from portrait to landscape mode.
However, it is impossible to take into account all these observations at once - work will arise from reflection. Therefore, for starters, just ignore everything that is possible, for the sake of simplicity.

We split in two

And now we will begin to divide the available program into two - the server and the client. Let them still remain within the same process, but let them work in different threads and are not dependent on each other by data.

At this stage it would be possible to force the program to send datagrams to itself, but that would be too big a step. To begin with, I chose the ConcurrentQueue as the channel of interaction between these two processes — this is a thread-safe queue designed to implement the Producer-Consumer interaction. On the direct channel, the server will deliver fragments of the image to the client, and the client, on the return channel, will deliver information about the viewing window shifts and mouse clicks.

ConcurrentQueue in many properties is related to UDP, and when we debug the interaction through ConcurrentQueue, I hope to just replace the work with the queue to send and receive datagrams. In order for such a replacement to be simple, you need to bring the program to ensure that the queues are transmitted byte sequences of small length.
But first, I will use typed queues.
We divide

First, we define the data structures that will be used to send messages from the server to the client and back:

 struct ImageChunk { public Rectangle place; public Bitmap img; }; struct ControlData { public enum Action : byte { Shift, Click }; public Action action; public Point point; }

So, the server sends an image to the client, indicating where it was located on its server, monitor. Back - Shifts (Shift) and Mouse Clicks (Click); Let's agree to use server coordinates everywhere.

Now let's copy the existing Main function again, rename both copies to Server and Client, and write the new Main:

 static void Main(string[] args) { var img_channel = new BlockingCollection<ImageChunk>( new ConcurrentQueue<ImageChunk>() ); var control_channel = new BlockingCollection<ControlData>( new ConcurrentQueue<ControlData>()); Server(control_channel, img_channel); Client(img_channel, control_channel); }

It looks like a textual statement of a simple flowchart, right?

Remove from the Server all that does not relate to capturing an image from the screen and add work with queues. Also, I decided to fix the window size at 400x300 on the client and on the server so that the listing does not grow even a couple of paragraphs.

 static void Server(BlockingCollection<ControlData> input, BlockingCollection<ImageChunk> output) { Point window_topleft = new Point(); Size window_size = new Size(400, 300); var timer = new System.Windows.Forms.Timer() { Interval = 40 }; timer.Tick += (s, e) => { //       ControlData incoming; while (input.TryTake(out incoming)) { switch (incoming.action) { case ControlData.Action.Click: mouse_event((uint)(MOUSEEVENTF_LEFTDOWN | MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_MOVE), (uint)incoming.point.X, (uint)incoming.point.Y, 0, 0); mouse_event((uint)(MOUSEEVENTF_LEFTUP | MOUSEEVENTF_ABSOLUTE | MOUSEEVENTF_MOVE), (uint)incoming.point.X, (uint)incoming.point.Y, 0, 0); break; case ControlData.Action.Shift: window_topleft = incoming.point; break; } } //        var b = new Bitmap(window_size.Width,window_size.Height); var g = Graphics.FromImage(b); g.CopyFromScreen(window_topleft.X, window_topleft.Y, 0, 0, window_size); g.Dispose(); output.Add(new ImageChunk() { img = b, place = new Rectangle(window_topleft, window_size) } ); }; timer.Start(); }

From Client, we remove everything that Server now does for it:

 static void Client(BlockingCollection<ImageChunk> input, BlockingCollection<ControlData> output) { Form f = new Form(){ ClientSize = new Size(400, 300) }; Point window_topleft = new Point(); Size mouse_prev_loc = new Size(); bool mouse_lbdown = false; Point mouse_down_loc = new Point(); //    f.MouseDown += (s, e) => { mouse_lbdown = true; mouse_down_loc = e.Location; }; f.MouseUp += (s, e) => { mouse_lbdown = false; if (Math.Abs(e.Location.X - mouse_down_loc.X) < 1 && Math.Abs(e.Location.Y - mouse_down_loc.Y) < 1) { int click_to_x = (window_topleft.X + mouse_down_loc.X) * 65536 / Screen.PrimaryScreen.Bounds.Width; int click_to_y = (window_topleft.Y + mouse_down_loc.Y) * 65536 / Screen.PrimaryScreen.Bounds.Height; output.Add(new ControlData() { action=ControlData.Action.Click, point=new Point(click_to_x,click_to_y) }); } }; f.MouseMove += (s, e) => { if (mouse_lbdown) { window_topleft += mouse_prev_loc - (Size)(e.Location); output.Add(new ControlData() { action = ControlData.Action.Shift, point = window_topleft } ); } mouse_prev_loc = (Size)e.Location; }; //    var timer = new System.Windows.Forms.Timer() { Interval = 40 }; timer.Tick += (s, e) => { ImageChunk incoming; //    -  if( ! input.TryTake(out incoming,5) ) return; Graphics g = f.CreateGraphics(); g.DrawImageUnscaled(incoming.img, incoming.place.X - window_topleft.X, incoming.place.Y - window_topleft.Y); g.Dispose(); incoming.img.Dispose(); }; timer.Start(); Application.Run(f); }

The appearance of the application has not changed, so the screenshot will not be.

NB: by the way, you can fork here ..

... and make yourself a remote desktop via pipes, via http, through RS232, etc. It is enough just to write serialization, compression and transport for objects that are queuing.

In the next part of the article I will describe the compression, sharpened by the subsequent transfer via UDP. A special feature of UDP is the small size of atomically transmitted data (packets), as well as the loss and reordering of packets.

Source: https://habr.com/ru/post/128467/

All Articles

We do ourselves a remote-desktop client for a smartphone. Part 1: Server

What you should think about before moving on

We split in two

NB: by the way, you can fork here ..

More articles: