TL; DR. The article describes the reverse development of the Dropbox client, hacking the obfuscation and decompiling of the client in Python, as well as changing the program to activate debugging functions that are hidden in normal mode. If you are only interested in the relevant code and instructions, scroll to the end. At the time of this writing, the code is compatible with the latest versions of Dropbox, based on the CPython 3.6 interpreter.Introduction
Dropbox fascinated me right from the moment it appeared. The concept is still deceptively simple. Here is the folder. Put files there. It is synchronized. Go to another device. It is synchronized again. The folder and files are now there!
The amount of hidden background work is really amazing. First, all the problems that have to be dealt with when creating and maintaining a cross-platform application for major desktop operating systems (OS X, Linux, Windows) do not disappear anywhere. Add to this support for various web browsers, various mobile operating systems. And we are talking only about the client part. I am also interested in the Dropbox backend, which allowed me to achieve such scalability and low latency with the insanely heavy workload that half a billion users create.
It is for these reasons that I have always liked to see what Dropbox is doing under the hood and how it has evolved over the years. About eight years ago, I first tried to figure out how the Dropbox client actually works when I noticed a broadcast of unknown traffic while in a hotel. The investigation revealed that this is part of the Dropbox feature called LanSync, which allows you to synchronize faster if Dropbox nodes on the same local network have access to the same files. However, the protocol was not documented, and I wanted to know more. Therefore, I decided to take a closer look, and as a result I did reverse engineering of almost the entire program. This study was never published, although I sometimes shared notes with some people.
')
When we opened Anvil Ventures, Chris and I evaluated a number of tools for document storage, sharing and collaboration. One of them, obviously, was Dropbox, but for me this is another reason to dig out old research and check them on the current version of the client.
Decryption and deobfuscation
First, I downloaded a client for Linux and quickly found out that it was written in Python. Since the Python license is quite permissive, people can easily modify and distribute the Python interpreter along with other dependencies as commercial software. Then I started reverse engineering to understand how the client works.
At that time, the bytecode files were in a zip file combined with an executable binary. The main binary file was simply a modified Python interpreter, which was loaded by capturing Python import mechanisms. Each subsequent import call was redirected to this binary with parsing the ZIP file. Of course, it is easy to extract this ZIP from a binary. For example, the useful
binwalk tool retrieves it with all byte-compiled .pyc files.
Then I could not break the encryption for the .pyc files, but in the end I took the general object of the standard Python library and recompiled it, inserting a backdoor inside. Now that the Dropbox client loaded this object, I could easily execute arbitrary Python code in a working interpreter. Although I discovered this on my own, Florian Ledoux and Nicolas Ruff used the same method in a
presentation at Hack.lu in 2012.
The ability to explore and manipulate running code in Dropbox made it possible to figure out a lot. The code used several defensive tricks to make it difficult to dump
code objects . For example, in a conventional CPython interpreter, it is easy to restore the compiled bytecode representing a function. A simple example:
>>> def f(i=0): ... return i * i ... >>> f.__code__ <code object f at 0x109deb540, file "<stdin>", line 1> >>> f.__code__.co_code b'|\x00|\x00\x14\x00S\x00' >>> import dis >>> dis.dis(f) 2 0 LOAD_FAST 0 (i) 2 LOAD_FAST 0 (i) 4 BINARY_MULTIPLY 6 RETURN_VALUE >>>
But in the compiled version of
Objects / codeobject.c, the co_code property
co_code
removed from the open list. This member list usually looks like this:
static PyMemberDef code_memberlist[] = { ... {"co_flags", T_INT, OFF(co_flags), READONLY}, {"co_code", T_OBJECT, OFF(co_code), READONLY}, {"co_consts", T_OBJECT, OFF(co_consts), READONLY}, ... };
The disappearance of the
co_code
property makes it impossible to dump these code objects.
In addition, other libraries have been removed, such as the standard Python
disassembler . As a result, I still managed to dump the code objects into files, but I still could not decompile them. It took some time before I realized that the opcodes used by the Dropbox interpreter do not match the standard Python opcodes. Thus, it was necessary to understand the new opcodes in order to rewrite the code objects back into the original Python bytecode.
One option is to
broadcast opcodes (opcode remapping). As far as I know, this technique was developed by Rich Smith and presented at
Defcon 18 . In that speech, he also showed the
pyREtic tool for reverse engineering Python bytecode in memory. The pyREtic code seems to be poorly supported, and the tool targets the “old” Python 2.x binaries. To get acquainted with the techniques that came up with Rich, it is highly recommended to watch his performance.
The opcode translation method takes all the code objects of the standard Python library and compares them with the objects extracted from the Dropbox binary. For example, code objects from
hashlib.pyc or
socket.pyc , which are in the standard library. For example, if each time the opcode
0x43
corresponds to the de-encapsulated opcode
0x21
, you can gradually build a translation table for rewriting code objects. These code objects can then be moved through the Python decompiler. To make dumps, you still need a fixed interpreter with the correct
co_code
object.
Another option is to hack the serialization format. In Python, serialization is called
marshaling . Deserialization of obfuscated files in the usual way did not work. When backing up a binary file in IDA Pro, I discovered a decryption step. As far as I know, the first to publish something publicly on this topic is Hagen Fritsch on
his blog . There, he refers to changes in new versions of Dropbox (when Dropbox switched from Python 2.5 to Python 2.7). The algorithm works as follows:
- When unpacking the pyc file, the header is read to determine the marshaling version. This format is not documented, except for the CPython implementation itself.
- The format defines a list of types that are encoded in it. The types are
True
, False
, floats
, etc., but the most important is the type for the above-mentioned objects of the Python code object
, the code object
.
- When loading a
code object
, two additional values ​​are first read from the input file.
- The first one is the 32-bit value of
random
.
- The second is a 32-bit
length
value indicating the size of the serialized code object.
- Then the
rand
and length
values ​​are fed to a simple RNG function that generates a seed
.
- This seed is supplied to the Mersenne vortex , which generates four 32-bit values.
- Combined together, these four values ​​give the encryption key for serialized data. The encryption algorithm then decrypts the data using the Tiny Encryption Algorithm .
In my code, I wrote a demarshaling procedure in Python from scratch. The part that decodes the code objects looks something like a fragment below. It should be noted that this method will have to be called recursively. The top-level object for the pyc file is a code object that itself contains code objects, which can be classes, functions, or lambdas. In turn, they can also contain methods, functions, or lambdas. These are all code objects down the hierarchy!
def load_code(self): rand = self.r_long() length = self.r_long() seed = rng(rand, length) mt = MT19937(seed) key = [] for i in range(0, 4): key.append(mt.extract_number())
The ability to decrypt code objects means that after deserialization of the procedures, the actual bytecode needs to be rewritten. Code objects contain information about line numbers, constants, and other information. The actual bytecode is in the
co_code
object. When we built the opcode translation table, we can simply replace the obfuscated Dropbox values ​​with standard Python 3.6 equivalents.
Now the code objects are in the usual Python 3.6 format, and can be passed to the decompiler. The quality of Python decompilers has increased significantly due to the R. Bernstein
uncompyle6 project. Decompilation gave a pretty good result, and I was able to put everything together in a tool that, to the best of its ability, decompiles the current version of the Dropbox client.
If you clone this
repository and follow the instructions, the result will be something like this:
...
__main__ - INFO - Successfully decompiled dropbox / client / features / browse_search / __ init __. pyc
__main__ - INFO - Decrypting, patching and decompiling _bootstrap_overrides.pyc
__main__ - INFO - Successfully decompiled _bootstrap_overrides.pyc
__main__ - INFO - Processed 3713 files (3591 succesfully decompiled, 122 failed)
opcodemap - WARNING - NOT writing opcodemap as force overwrite not set
This means that you now have an
out/
directory with a decompiled version of the Dropbox source code.
Enable Dropbox Tracing
In the open source, I started looking for something interesting, and my attention was drawn to the following fragment. Trace handlers in
out/dropbox/client/high_trace.py
are installed only if the assembly is not frozen or in the line
1430
restrictive “magic key” functionality or the cookie file is not set.
1424 def install_global_trace_handlers(flags=None, args=None): 1425 global _tracing_initialized 1426 if _tracing_initialized: 1427 TRACE('!! Already enabled tracing system') 1428 return 1429 _tracing_initialized = True 1430 if not build_number.is_frozen() or magic_trace_key_is_set() or limited_support_cookie_is_set(): 1431 if not os.getenv('DBNOLOCALTRACE'): 1432 add_trace_handler(db_thread(LtraceThread)().trace) 1433 if os.getenv('DBTRACEFILE'): 1434 pass
The mention of frozen builds refers to Dropbox internal debug builds. A little higher in the same file you can find the following lines:
272 def is_valid_time_limited_cookie(cookie): 273 try: 274 try: 275 t_when = int(cookie[:8], 16) ^ 1686035233 276 except ValueError: 277 return False 278 else: 279 if abs(time.time() - t_when) < SECONDS_PER_DAY * 2 and md5(make_bytes(cookie[:8]) + b'traceme').hexdigest()[:6] == cookie[8:]: 280 return True 281 except Exception: 282 report_exception() 283 284 return False 285 286 287 def limited_support_cookie_is_set(): 288 dbdev = os.getenv('DBDEV') 289 return dbdev is not None and is_valid_time_limited_cookie(dbdev) build_number/environment.py
As can be seen from the
limited_support_cookie_is_set
method on line
287
, tracing is enabled only if the environment variable called
DBDEV
correctly set in a cookie with a limited lifetime. Well, this is interesting! And now we know how to generate such time-limited cookies. Judging by the name, Dropbox engineers can generate such cookies, and then temporarily enable tracing in certain cases when it is required for customer support. After you restart Dropbox or restart your computer, even if the specified cookie is still in place, it automatically expires. I assume that this should prevent, for example, performance degradation due to continuous tracing. It also makes it difficult to reverse develop Dropbox.
However, a small script can simply constantly generate and set these cookies. Something like that:
Then a time based cookie is created:
$ python3 setenv.py DBDEV=38b28b3f349714; export DBDEV;
Then correctly load the output of this script into the environment and run the Dropbox client.
$ eval `python3 setenv.py` $ ~/.dropbox-dist/dropbox-lnx_64-71.4.108/dropbox
This includes tracing output, with multi-colored formatting and all that. It looks something like this unregistered client:

Introduction of a new code
All this is slightly amusing. Studying further the decompiled code, we find
out/build_number/environment.pyc
. There is a function that checks whether a certain magic key is installed. This key is not hard coded in the code, but is compared with SHA-256 hash. Here is the corresponding fragment.
1 import hashlib, os 2 from typing import Optional, Text 3 _MAGIC_TRACE_KEY_IS_SET = None 4 5 def magic_trace_key_is_set(): 6 global _MAGIC_TRACE_KEY_IS_SET 7 if _MAGIC_TRACE_KEY_IS_SET is None: 8 dbdev = os.getenv('DBDEV') or '' 9 if isinstance(dbdev, Text): 10 bytes_dbdev = dbdev.encode('ascii') 11 else: 12 bytes_dbdev = dbdev 13 dbdev_hash = hashlib.sha256(bytes_dbdev).hexdigest() 14 _MAGIC_TRACE_KEY_IS_SET = dbdev_hash == 'e27eae61e774b19f4053361e523c771a92e838026da42c60e6b097d9cb2bc825' 15 return _MAGIC_TRACE_KEY_IS_SET
This method is repeatedly called from different places in the code to check if the magic key for tracing is installed. I tried to crack the SHA-256 hash with
John the Ripper brute force, but the brute force was taking too long, but I couldn’t reduce the number of options because there was no guesswork about the content. In Dropbox, developers can have a specific hard-coded development key, which they install if necessary, by activating the client's mode with a “magic key” for tracing.
It annoyed me because I wanted to find a quick and easy way to launch Dropbox with this key set for tracing. Therefore, I wrote a marshaling procedure that generates pyc encrypted files according to Dropbox encryption. Thus, I was able to enter my own code or simply replace the above hash. This code in the repository on Github is in the
patchzip.py
file. As a result, the hash is replaced by the SHA-256 hash
ANVILVENTURES
. Then the code object is re-encrypted and placed in a zip where all obfuscated code is stored. This allows you to do the following:
$ DBDEV = ANVILVENTURES; export DBDEV;
$ ~ / .dropbox-dist / dropbox-lnx_64-71.4.108 / dropbox
Now all debugging functions are displayed when you right-click on the Dropbox icon in the system tray.
Studying further decompiled sources, in the
dropbox/webdebugger/server.py
I found a method called
is_enabled
. It looks like it checks to enable the built-in web debugger. First of all, he checks the aforementioned magic key. Since we replaced the SHA-256 hash, we can simply set the value of
ANVILVENTURES
. The second part in lines
201
and
202
checks whether there is an environment variable named
DB<x>
, which has
x
equal to the SHA-256 hash. The value of the environment sets cookies with a time limit, as we have already seen.
191 @classmethod 192 def is_enabled(cls): 193 if cls._magic_key_set: 194 return cls._magic_key_set 195 else: 196 cls._magic_key_set = False 197 if not magic_trace_key_is_set(): 198 return False 199 for var in os.environ: 200 if var.startswith('DB'): 201 var_hash = hashlib.sha256(make_bytes(var[2:])).hexdigest() 202 if var_hash == '5df50a9c69f00ac71f873d02ff14f3b86e39600312c0b603cbb76b8b8a433d3ff0757214287b25fb01' and is_valid_time_limited_cookie(os.environ[var]): 203 cls._magic_key_set = True 204 return True 205 206 return False
Using the exact same technique, replacing this hash with SHA-256, which was used before, we can now change the previously written
setenv
script to something like this:
$ cat setenv.py … c = generate_time_cookie() output_env("DBDEV", "ANVILVENTURES") output_env("DBANVILVENTURES", c) $ python3 setenv.py DBDEV=ANVILVENTURES; export DBDEV; DBANVILVENTURES=38b285c4034a67; export DBANVILVENTURES $ eval `python3 setenv.py` $ ~/.dropbox-dist/dropbox-lnx_64-71.4.108/dropbox
As you can see, after the client starts, a new TCP port is opened for listening. It will not open if the environment variables are not set correctly.
$ netstat --tcp -lnp | grep dropbox
tcp 0 0 127.0.0.1:4242 0.0.0.0:* LISTEN 1517 / dropbox
Further in the code, you can find the WebSocket interface in the
webpdb.pyc
file. This is a wrapper for standard Python utilities
pdb . Access to the interface is provided via an HTTP server on this port. Let's install
the websocket client and test it:
$ websocat -t ws: //127.0.0.1: 4242 / pdb
--Return--
> /home/gvb/dropbox/webdebugger/webpdb.pyc(101)run()->None
>
(Pdb) from build_number.environment import magic_trace_key_is_set as ms
(Pdb) ms ()
True
Thus, we now have a full-fledged debugger in the client, which in all other respects works as before. We can execute arbitrary Python code, we managed to turn on the internal debugging menu and trace functions. All this will greatly help in the further analysis of the Dropbox client.
Conclusion
We managed to do the reverse development of Dropbox, write decryption and code injection tools that work with current Dropbox clients based on Python 3.6. We did reverse engineering of separate hidden functions and activated them. Obviously, the debugger will really help in further hacking. Especially with a number of files that can not be successfully decompiled due to the shortcomings of decompyle6.
Code
The code can be found
on Github . Instructions for use there. This repository also contains my old code, written in 2011. It should work with only a few modifications, provided that someone has older versions of Dropbox based on Python 2.7.
The repository also contains scripts for broadcasting opcodes, instructions for setting the Dropbox environment variables and everything you need to change the zip file.
Acknowledgments
Thanks to Brian from Anvil Ventures for reviewing my code. Work on this code lasted for several years, from time to time I updated it, introduced new methods and rewrote fragments in order to restore its work on new versions of Dropbox.
As mentioned earlier, an excellent starting point for reverse engineering applications in Python is the work of Rich Smith, Florian Ledoux and Nicolas Raff, as well as Hagen Fritsch. Especially their work is relevant for the reverse development of one of the largest Python applications in the world - the Dropbox client.
It should be noted that the decompilation of the Python code was greatly advanced thanks to the uncompyle6 project headed by R. Bernstein. This decompiler builds and improves many different Python decompilers.
Thanks also to colleagues Brian, Austin, Stefan and Chris for reviewing this article.