In this article I will talk about the internal structure of the popular cloud storage service Dropbox. In particular, the Dropbox protocol device will be affected, and statistics of its use in some European countries will also be shown. In addition, I will compare it with other services, such as iCloud, Google Drive and SkyDrive.
The article is purely technical. There will be no summary tables with a cost per GB and an analysis of how much more can be obtained for the invited "friends."
The text is based on the scientific article “Dropbox from the inside: Exploring Cloud Storage Services” (Inside Dropbox: Understanding Personal Cloud Storage Services).
PDFIn the past few years there has been a huge jump in the popularity of cloud storage services. All major players and several young startups participate in the arms race. Basically, all the information about the internal structure of services and the real numbers of their use is a secret behind seven seals. We are fed only with data passed through the marketing department, which, of course, is somewhat different from reality. So let's dig deeper with the guys Idilio Drago, Anna Sperotto, Marco Mellia, Ramin Sadre, Maurizio M. Munafò and Aiko Pras - the authors of the study.
')
Introduction
The Dropbox client is designed primarily in Python using third-party libraries, such as librsync. The client supports all major operating systems: Windows, Mac, Linux. Using Python unequivocally indicates that the client was designed with lightweight porting to various platforms.
The main element of the system is a block (chunk) up to 4 Mb in size. In case the file is larger, it is divided into several blocks, and each block is perceived by the system independently of the others. For each block, a SHA256 hash is computed, and this information is part of the meta information about the file. Dropbox reduces the amount of data transferred by transferring only the difference between the modified blocks of the file. In addition, locally it contains all the meta information on files, which it synchronizes with the server and sends only changes from the previous version (incremental updates).
Dropbox uses two types of servers: a control (control) and a data server (data storage). Management servers are controlled by Dropbox, data servers are Amazon servers (Amazon S3, EC2). For communication with servers, HTTPS is used in all cases.
The domain names used by Dropbox always end with dropbox.com. The table below lists the subdomains for management and data servers.
Subdomain | Hosting | Description |
---|
client-lb / clientX | Dropbox | Meta data |
notifyX | Dropbox | Notifications |
api | Dropbox | API control |
www | Dropbox | Web servers |
d | Dropbox | Event logs |
dl | Amazon | Direct links |
dl-clientX | Amazon | Client storage |
dl-debugX | Amazon | Back traces |
dl-web | Amazon | Web storage |
api-content | Amazon | API storage |
Dropbox: inside
Since Dropbox uses HTTPS to encrypt all traffic between servers, simply intercepting will not yield any useful information. For research, we installed Squid and sent all traffic from a Linux computer to this proxy. SSL-bump was also installed on the proxy so that SSL could be decrypted. The final step is to install the self-signed certificate on Squid and modify the certificate inside the application launched by Dropbox. This configuration allows you to decrypt and view Dropbox traffic.

The illustration shows the protocol used by Dropbox to upload locally modified blocks to their servers. After registering the client on the
clientX.dropbox.com management servers, the
list command receives changes in the metadata that show the difference between the local copy and what is on the server. As soon as a local file change occurs, Dropbox invokes the
commit_batch (
client-lb.dropbox.com )
command and sends the modified metadata to the server. After that, the server responds to which blocks it needs, using the
need_blocks command, and the client sends these blocks to Amazon (
dl-clientX.dropbox.com ). The saving of each block is confirmed by the OK command.
After that, the local client once again sends the
commit_batch command to the server and receives confirmation that all blocks have been received. Data storage transactions can be executed in parallel.
Control protocol
Dropbox uses the following management server groups:
- Notifications
Dropbox keeps a constant open TCP connection to the notification servers ( notifyX.dropbox.com ). This is necessary to obtain information about file changes that could have occurred on other clients. Compared to other traffic, this information is not encrypted. HTTP response delay is used to quickly notify clients (push mechanism). The client sends a request, and the server delays the response by about 60 seconds. After 60 seconds, the client immediately sends the next request to the server. If the response is generated earlier, the server responds immediately. - Metadata management (meta-data administration)
Metadata management servers are responsible not only for informing about changes in blocks and files, but also for authorizing (authentication) the client. The following domain names are used for these servers: client-lb.dropbox.com, clientX.dropbox.com. In addition, management servers can control client behavior. At the time of the experiment, it was noted that the server can indicate to the client the maximum number of blocks that it can send to the server. This is used to control the traffic that the client generates. - System messages (system logs)
servers are provided by Amazon and are called dl-debug.dropbox.com; the rest of the messages go directly to Dropbox d.dropbox.com .
Data set and customer popularity
We have chosen the passive way to monitor Dropbox. To collect traffic used open source tool Tstat. Tstat allows you to collect a variety of information about TCP, providing information for more than a hundred different connection parameters. We have taken a few extra steps to analyze Dropbox.
Since Dropbox uses HTTPS, we have determined that the name in all the certificates used by Dropbox is * .dropbox.com. It was important to properly classify traffic.
We filled up the open information with records from the DNS servers to which the clients contacted. So we linked the IP addresses and server names.
Tstat returned unencrypted information about the device and the names of the directories exchanged between the client and the notification server.
Data was obtained using a Tstat installation at 4 points in Europe. Records from points designated Home 1 and Home 2 are data from users of a well-known Internet service provider (ISP) that provides Internet via ADSL and optical cable. The data, designated Campus 1 and Campus 2, was collected at universities. Studies were conducted from March 24, 2012 to May 5, 2012.
Name | Type of | Number of IP addresses | Data Volume (GB) |
---|
Campus 1 | Wired | 400 | 5.320 |
Campus 2 | Wired / wireless | 2,528 | 55,054 |
Home 1 | FTTH / ADSL | 18,785 | 509,909 |
Home 2 | ADSL | 13,723 | 301,448 |
Below is a graph that shows how many different IP addresses were associated with a cloud storage service at least once a day.

The second graph shows how much data was transferred to this cloud storage per day.

I would like to draw attention to the following:
- Despite the large number of devices using iCloud, the amount of data transmitted to this service is comparable with other services.
- At the time of the appearance of Google Drive, the traffic transmitted to this service made a big jump and approached iCloud; At the same time, the number of installations of the program remained minimal.
For comparison, we give data on the use of services YouTube and Dropbox in Campus 2.

The table shows the total Dropbox traffic that we monitored during our measurements.
| Campus 1 | Campus 2 | Home 1 | Home 2 | Total |
---|
Requests | 167,189 | 1,902,824 | 1,438,369 | 693,086 | 4,204,666 |
Volume (GB) | 146 | 1.814 | 1,153 | 506 | 3.624 |
Devices | 283 | 6,609 | 3,350 | 1,313 | 11,561 |
Traffic analysis
The graphs show a cumulative distribution function for a different number of blocks.

It turned out that in more than 80% percent of cases, the number of blocks does not exceed 10 when storing data. same blocks. Analysis of the data obtained shows that the main use case of Dropbox is to constantly work with small, constantly changing files.
As we discussed above, Dropbox uses central data storage servers. This immediately leads to a question about the speed of the service for users who are geographically far from the servers.
The maximum speed we observed was close to 10 Mbit / s and was observed on files larger than 1 Mb. The average speed for Campus 2 was: write - 462 kbits / s and read - 797 kbits / s. For Campus 1: write - 359 kbits / s and read - 783 kbits / s.


Also from the graphs it can be seen that the speed significantly depends on the number of blocks: the more blocks, the lower the speed.
Changes in Dropbox 1.4.0
Starting with version 1.4.0, Dropbox added two new commands:
store_batch and
retrieve_batch , which allows you to work with several blocks at the same time. This improvement should significantly improve service throughput.
Number of devices
The graph shows the number of Dropbox installations for users at home. In about 60% of cases, users have only 1 device with Dropbox. 25% of home users have 2 devices using Dropbox.

Average usage time
The graph shows the average time to use Dropbox. Analyzing the time of use, we looked at how long the client was in contact with the notification server. Since the customer always keeps this connection open or opens it again, this is a good way to estimate usage time.

The graph shows that the use of Dropbox in most cases is less than 4 hours. The exception is Campus 1, where there are many working computers and computers that work constantly.
Initial data
You can download the source data that was used in this article for further analysis. (
Baseline ).
I want to note that the original article contains more information. It may contain answers to questions that you may have after reading.