Autonomous ftp-client with resume files

I want to share my experience in the development of an autonomous ftp client.

There is an ftp-server on which data periodically appears in the form of graphic images and text files, their size varies from tens of kilobytes to a couple of gigabytes. Internet access can be through a wire, or it can be through a GSM whistle or even via satellite, that is, stable and unstable, respectively. In the second case, the probability of loss of connection due to weather conditions, geographical location, etc., increases dramatically.

So, the client requirements are as follows:
')

Poll ftp-server for new files and their subsequent download.
In the case of a sudden stop of the download (whether the connection is broken, or the system on which my ftp-client stands), the download should continue as soon as possible.
Restriction of download speed (this is due to the cost of traffic over GSM).

If my way of solving the problem is interesting, I ask under the cut!

For convenience, you can split the entire article into key stages of the client's work, with code examples and a more detailed description of the subtleties of the work.

Formulation of the problem

Having thought it over, I decided to write a client that works as follows:

We knock on the server, we get a list of files.
Look in your download history, if the file is not in the history, then add the file to the download queue.
If the file for some reason could not be downloaded, it is sent to the end of the download queue.
Successfully uploaded file is added to history.

And some features:

The history is stored in runtime and duplicated in the xml-file, where you can restore the history
The client supports downloading multiple files simultaneously in different streams.

Periodic polling of the server and getting a list of files

The decision to periodically poll the server comes to mind almost immediately - to start a timer, which will contain a method for obtaining a list of files. However, the server has a slightly peculiar directory structure. In short, there are two folders on the server - notify and files. The files folder contains the data itself that you want to download, and all of them have unique names of the type FILE_ID_xxx , where x is any digit. The notify folder contains xml-files with the description of files from the files folder, including their real name, date of placement on the server and size.

After reading all the xml from the notify folder, I create a collection of a simple FileItem :

public class FileItem { [XmlAttribute(AttributeName = "RemoteUri")] public string RemoteUri; [XmlAttribute(AttributeName = "SavePath")] public string SavePath; [XmlAttribute(AttributeName = "Date")] public string Date; [XmlAttribute(AttributeName = "RefId")] public string RefId; [XmlAttribute(AttributeName = "Name")] public string Name; [XmlAttribute(AttributeName = "Extention")] public string Extention; [XmlAttribute(AttributeName = "Size")] public long Size; }

And then, going over the collection, we check whether the file is present in the download history, and whether it is currently loading

 foreach (var df in dataFiles) { if (!FileHistory.FileExists(df) && !client.AlreadyInProgress(df)) { client.DownloadFile(df); } }

That's all. Poll server and search for new files is over. I will write further about who FileHistory and client are.

Upload files in multiple streams

The " client " in the code above is an instance of the FTPClient class, which only deals with downloading files from the server. And in fact FTPClient is my FtpWebRequest wrapper.

FTPClient has a thread-safe queue called a “download queue”:

 private ConcurrentQueue<FileItem> downloadQueue;

So, what happens when you call the DownloadFile method:

 public void DownloadFile(FileItem file) { downloadQueue.Enqueue(file); StartDownloadTask(); }

Everything is quite simple - the file is added to the download queue, and after this the method that creates the task of downloading the file using the TPL is called. Here's what it looks like:

 private void StartDownloadTask() { if (currentActiveDownloads <= Settings.MaximumDownloadThreads) { FileItem file; if (!downloadQueue.IsEmpty && downloadQueue.TryDequeue(out file)) { Task t; if (File.Exists(file.SavePath)) { FileInfo info = new FileInfo(file.SavePath); var currentSize = info.Length; t = new Task(() => DownloadTask(file, currentSize)); } else { t = new Task(() => DownloadTask(file, 0)); } t.ContinueWith(OnTaskComplete); t.Start(); Interlocked.Increment(ref currentActiveDownloads); lock (inProgressLock) { inProgress.Add(file); } } }

Speaking in Russian, first check how many tasks are already working on downloading a file, and if there is an opportunity to push another one. Then we try to get the FileItem from the download queue if the queue is not empty. Then we determine whether the file is already present locally or not. A locally file may be present if the download is unexpectedly interrupted. Everything that we managed to download remains on the disk. So, in this case, we just start the download from the place where we stopped.

The OnTaskComplete method that will be called upon completion of DownloadTask :

 private void OnTaskComplete(Task t) { Interlocked.Decrement(ref currentActiveDownloads); StartDownloadTask(); }

That is, reduce the active download count, and try to start a new download task. That is, it turns out that a new download task will be created when a new file is added to the download queue and at the end of the current download task.

Now the method directly involved in downloading the file from the server:

 private void DownloadTask(FileItem file, long offset) { //       .  ,      - ,             Thread.Sleep(10 * 1000); Log.Info(string.Format("  {0}", file.Name)); try { if (offset == file.Size) { Log.Info(string.Format(" {0}   .", file.Name)); FileHistory.AddToDownloadHistory(file); return; } using (var readStream = GetResponseStreamFromServer(file.RemoteUri, WebRequestMethods.Ftp.DownloadFile, offset)) { using (var writeStream = new FileStream(file.SavePath, FileMode.Append, FileAccess.Write)) { var bufferSize = 1024; var buffer = new byte[bufferSize]; int second = 1000; int timePassed = 0; var stopWatch = new Stopwatch(); var readCount = readStream.Read(buffer, 0, bufferSize); int downloadedBytes = readCount; while(readCount > 0) { //           stopWatch.Start(); writeStream.Write(buffer, 0, readCount); readCount = readStream.Read(buffer, 0, bufferSize); stopWatch.Stop(); //    (0    ) if (Settings.MaximumDownloadSpeed > 0) { var downloadLimit = (Settings.MaximumDownloadSpeed * 1024 / 8) / currentActiveDownloads; downloadedBytes += readCount; timePassed += (int)stopWatch.ElapsedMilliseconds; if (downloadedBytes >= downloadLimit) { var pause = second - timePassed; if (pause > 0) Thread.Sleep(pause); timePassed = 0; downloadedBytes = 0; stopWatch.Reset(); } if (timePassed > second) { stopWatch.Reset(); timePassed = 0; downloadedBytes = 0; } } } } } lock (inProgressLock) { inProgress.Remove(file); } FileHistory.AddToDownloadHistory(file); Log.Info(string.Format("  - {0}", file.Name)); Interlocked.Add(ref currentLoadedSize, -file.Size); } catch (WebException e) { Log.Error(e); downloadQueue.Enqueue(file); } catch (Exception e) { Log.Error(e); } }

And the method that generates a request to the server and returns the answer:

 private Stream GetResponseStreamFromServer(string uri, string method, long offset) { var request = (FtpWebRequest)WebRequest.Create(uri); request.UseBinary = true; request.Credentials = new NetworkCredential(Settings.Login, Settings.Password); request.Method = method; request.Proxy = null; request.KeepAlive = false; request.ContentOffset = offset; var response = request.GetResponse(); return response.GetResponseStream(); }

That is, to start reading the stream not from the beginning, the string is used when forming the query:

 request.ContentOffset = offset;

And the speed limit works in the following way: first of all we calculate downloadLimit , how many bytes can the current stream download. The total speed limit and the number of active download threads are taken into account. Then we read a stream of 1024 bytes. Noticed how long it took ( TimePassed ). The total number of bytes read is recorded in downloadedBytes .

If the limit is exceeded, we pause the flow for the remaining time until the end of the second:

 var pause = second - timePassed; if (pause > 0) Thread.Sleep(pause);

After a second the counters are reset.

And in the case of WebExeption, the file is added to the download queue again. And in the history of the file will fall only after successful completion.

Download history

Storing the download history in a file is useful in case the application suddenly restarts and the runtime history is lost.

Inside the class FileHistory has a collection that contains FileItem , which we have already successfully downloaded:

 private static List<FileItem> downloadHistory;

Adding a file is very simple - we add the file to the collection and immediately write the changes to xml:

 public static void AddToDownloadHistory(FileItem file) { lock (historyLock) { XmlSerializer serializer = new XmlSerializer(typeof(List<FileItem>)); using (var writer = GetXml()) { downloadHistory.Add(file); serializer.Serialize(writer, downloadHistory); } } }

And this is what happens when we want to check for a file in history:

 public static bool FileExists(FileItem file) { lock (historyLock) { if (downloadHistory.Count == 0) { if (!TryRestoreHistoryFromXml()) { return false; } } return downloadHistory.Any(f => f.RefId == file.RefId); } }

Let me explain - here is the verification method. And the entries in our collection are zero. Probably, the application fell, and the story was lost. In this case, try to restore the history of the xml. If this fails (the file is missing or damaged) - we believe that we have not downloaded this file yet.

Completion

I hope that this article will help those who also have to write their ftp-client for the first time, like me. I do not pretend that the solution is perfect. And this is my first experience of writing articles on Habr, so I am open to criticism and comments.

Source: https://habr.com/ru/post/282600/

All Articles