📜 ⬆️ ⬇️

Reverse engineering client Dropbox

TL; DR. The article describes the reverse development of the Dropbox client, hacking the obfuscation and decompiling of the client in Python, as well as changing the program to activate debugging functions that are hidden in normal mode. If you are only interested in the relevant code and instructions, scroll to the end. At the time of this writing, the code is compatible with the latest versions of Dropbox, based on the CPython 3.6 interpreter.

Introduction


Dropbox fascinated me right from the moment it appeared. The concept is still deceptively simple. Here is the folder. Put files there. It is synchronized. Go to another device. It is synchronized again. The folder and files are now there!

The amount of hidden background work is really amazing. First, all the problems that have to be dealt with when creating and maintaining a cross-platform application for major desktop operating systems (OS X, Linux, Windows) do not disappear anywhere. Add to this support for various web browsers, various mobile operating systems. And we are talking only about the client part. I am also interested in the Dropbox backend, which allowed me to achieve such scalability and low latency with the insanely heavy workload that half a billion users create.

It is for these reasons that I have always liked to see what Dropbox is doing under the hood and how it has evolved over the years. About eight years ago, I first tried to figure out how the Dropbox client actually works when I noticed a broadcast of unknown traffic while in a hotel. The investigation revealed that this is part of the Dropbox feature called LanSync, which allows you to synchronize faster if Dropbox nodes on the same local network have access to the same files. However, the protocol was not documented, and I wanted to know more. Therefore, I decided to take a closer look, and as a result I did reverse engineering of almost the entire program. This study was never published, although I sometimes shared notes with some people.
')
When we opened Anvil Ventures, Chris and I evaluated a number of tools for document storage, sharing and collaboration. One of them, obviously, was Dropbox, but for me this is another reason to dig out old research and check them on the current version of the client.

Decryption and deobfuscation


First, I downloaded a client for Linux and quickly found out that it was written in Python. Since the Python license is quite permissive, people can easily modify and distribute the Python interpreter along with other dependencies as commercial software. Then I started reverse engineering to understand how the client works.

At that time, the bytecode files were in a zip file combined with an executable binary. The main binary file was simply a modified Python interpreter, which was loaded by capturing Python import mechanisms. Each subsequent import call was redirected to this binary with parsing the ZIP file. Of course, it is easy to extract this ZIP from a binary. For example, the useful binwalk tool retrieves it with all byte-compiled .pyc files.

Then I could not break the encryption for the .pyc files, but in the end I took the general object of the standard Python library and recompiled it, inserting a backdoor inside. Now that the Dropbox client loaded this object, I could easily execute arbitrary Python code in a working interpreter. Although I discovered this on my own, Florian Ledoux and Nicolas Ruff used the same method in a presentation at Hack.lu in 2012.

The ability to explore and manipulate running code in Dropbox made it possible to figure out a lot. The code used several defensive tricks to make it difficult to dump code objects . For example, in a conventional CPython interpreter, it is easy to restore the compiled bytecode representing a function. A simple example:

>>> def f(i=0): ... return i * i ... >>> f.__code__ <code object f at 0x109deb540, file "<stdin>", line 1> >>> f.__code__.co_code b'|\x00|\x00\x14\x00S\x00' >>> import dis >>> dis.dis(f) 2 0 LOAD_FAST 0 (i) 2 LOAD_FAST 0 (i) 4 BINARY_MULTIPLY 6 RETURN_VALUE >>> 

But in the compiled version of Objects / codeobject.c, the co_code property co_code removed from the open list. This member list usually looks like this:

  static PyMemberDef code_memberlist[] = { ... {"co_flags", T_INT, OFF(co_flags), READONLY}, {"co_code", T_OBJECT, OFF(co_code), READONLY}, {"co_consts", T_OBJECT, OFF(co_consts), READONLY}, ... }; 

The disappearance of the co_code property makes it impossible to dump these code objects.

In addition, other libraries have been removed, such as the standard Python disassembler . As a result, I still managed to dump the code objects into files, but I still could not decompile them. It took some time before I realized that the opcodes used by the Dropbox interpreter do not match the standard Python opcodes. Thus, it was necessary to understand the new opcodes in order to rewrite the code objects back into the original Python bytecode.

One option is to broadcast opcodes (opcode remapping). As far as I know, this technique was developed by Rich Smith and presented at Defcon 18 . In that speech, he also showed the pyREtic tool for reverse engineering Python bytecode in memory. The pyREtic code seems to be poorly supported, and the tool targets the “old” Python 2.x binaries. To get acquainted with the techniques that came up with Rich, it is highly recommended to watch his performance.

The opcode translation method takes all the code objects of the standard Python library and compares them with the objects extracted from the Dropbox binary. For example, code objects from hashlib.pyc or socket.pyc , which are in the standard library. For example, if each time the opcode 0x43 corresponds to the de-encapsulated opcode 0x21 , you can gradually build a translation table for rewriting code objects. These code objects can then be moved through the Python decompiler. To make dumps, you still need a fixed interpreter with the correct co_code object.

Another option is to hack the serialization format. In Python, serialization is called marshaling . Deserialization of obfuscated files in the usual way did not work. When backing up a binary file in IDA Pro, I discovered a decryption step. As far as I know, the first to publish something publicly on this topic is Hagen Fritsch on his blog . There, he refers to changes in new versions of Dropbox (when Dropbox switched from Python 2.5 to Python 2.7). The algorithm works as follows:


In my code, I wrote a demarshaling procedure in Python from scratch. The part that decodes the code objects looks something like a fragment below. It should be noted that this method will have to be called recursively. The top-level object for the pyc file is a code object that itself contains code objects, which can be classes, functions, or lambdas. In turn, they can also contain methods, functions, or lambdas. These are all code objects down the hierarchy!

  def load_code(self): rand = self.r_long() length = self.r_long() seed = rng(rand, length) mt = MT19937(seed) key = [] for i in range(0, 4): key.append(mt.extract_number()) # take care of padding for size calculation sz = (length + 15) & ~0xf words = sz / 4 # convert data to list of dwords buf = self._read(sz) data = list(struct.unpack("<%dL" % words, buf)) # decrypt and convert back to stream of bytes data = tea.tea_decipher(data, key) data = struct.pack("<%dL" % words, *data) 

The ability to decrypt code objects means that after deserialization of the procedures, the actual bytecode needs to be rewritten. Code objects contain information about line numbers, constants, and other information. The actual bytecode is in the co_code object. When we built the opcode translation table, we can simply replace the obfuscated Dropbox values ​​with standard Python 3.6 equivalents.

Now the code objects are in the usual Python 3.6 format, and can be passed to the decompiler. The quality of Python decompilers has increased significantly due to the R. Bernstein uncompyle6 project. Decompilation gave a pretty good result, and I was able to put everything together in a tool that, to the best of its ability, decompiles the current version of the Dropbox client.

If you clone this repository and follow the instructions, the result will be something like this:

  ...
     __main__ - INFO - Successfully decompiled dropbox / client / features / browse_search / __ init __. pyc
     __main__ - INFO - Decrypting, patching and decompiling _bootstrap_overrides.pyc
     __main__ - INFO - Successfully decompiled _bootstrap_overrides.pyc
     __main__ - INFO - Processed 3713 files (3591 succesfully decompiled, 122 failed)
     opcodemap - WARNING - NOT writing opcodemap as force overwrite not set 

This means that you now have an out/ directory with a decompiled version of the Dropbox source code.

Enable Dropbox Tracing


In the open source, I started looking for something interesting, and my attention was drawn to the following fragment. Trace handlers in out/dropbox/client/high_trace.py are installed only if the assembly is not frozen or in the line 1430 restrictive “magic key” functionality or the cookie file is not set.

  1424 def install_global_trace_handlers(flags=None, args=None): 1425 global _tracing_initialized 1426 if _tracing_initialized: 1427 TRACE('!! Already enabled tracing system') 1428 return 1429 _tracing_initialized = True 1430 if not build_number.is_frozen() or magic_trace_key_is_set() or limited_support_cookie_is_set(): 1431 if not os.getenv('DBNOLOCALTRACE'): 1432 add_trace_handler(db_thread(LtraceThread)().trace) 1433 if os.getenv('DBTRACEFILE'): 1434 pass 

The mention of frozen builds refers to Dropbox internal debug builds. A little higher in the same file you can find the following lines:

  272 def is_valid_time_limited_cookie(cookie): 273 try: 274 try: 275 t_when = int(cookie[:8], 16) ^ 1686035233 276 except ValueError: 277 return False 278 else: 279 if abs(time.time() - t_when) < SECONDS_PER_DAY * 2 and md5(make_bytes(cookie[:8]) + b'traceme').hexdigest()[:6] == cookie[8:]: 280 return True 281 except Exception: 282 report_exception() 283 284 return False 285 286 287 def limited_support_cookie_is_set(): 288 dbdev = os.getenv('DBDEV') 289 return dbdev is not None and is_valid_time_limited_cookie(dbdev) build_number/environment.py 

As can be seen from the limited_support_cookie_is_set method on line 287 , tracing is enabled only if the environment variable called DBDEV correctly set in a cookie with a limited lifetime. Well, this is interesting! And now we know how to generate such time-limited cookies. Judging by the name, Dropbox engineers can generate such cookies, and then temporarily enable tracing in certain cases when it is required for customer support. After you restart Dropbox or restart your computer, even if the specified cookie is still in place, it automatically expires. I assume that this should prevent, for example, performance degradation due to continuous tracing. It also makes it difficult to reverse develop Dropbox.

However, a small script can simply constantly generate and set these cookies. Something like that:

  #!/usr/bin/env python3 def output_env(name, value): print("%s=%s; export %s" % (name, value, name)) def generate_time_cookie(): t = int(time.time()) c = 1686035233 s = "%.8x" % (t ^ c) h = md5(s.encode("utf-8?") + b"traceme").hexdigest() ret = "%s%s" % (s, h[:6]) return ret c = generate_time_cookie() output_env("DBDEV", c) 

Then a time based cookie is created:

  $ python3 setenv.py DBDEV=38b28b3f349714; export DBDEV; 

Then correctly load the output of this script into the environment and run the Dropbox client.

  $ eval `python3 setenv.py` $ ~/.dropbox-dist/dropbox-lnx_64-71.4.108/dropbox 

This includes tracing output, with multi-colored formatting and all that. It looks something like this unregistered client:



Introduction of a new code


All this is slightly amusing. Studying further the decompiled code, we find out/build_number/environment.pyc . There is a function that checks whether a certain magic key is installed. This key is not hard coded in the code, but is compared with SHA-256 hash. Here is the corresponding fragment.

  1 import hashlib, os 2 from typing import Optional, Text 3 _MAGIC_TRACE_KEY_IS_SET = None 4 5 def magic_trace_key_is_set(): 6 global _MAGIC_TRACE_KEY_IS_SET 7 if _MAGIC_TRACE_KEY_IS_SET is None: 8 dbdev = os.getenv('DBDEV') or '' 9 if isinstance(dbdev, Text): 10 bytes_dbdev = dbdev.encode('ascii') 11 else: 12 bytes_dbdev = dbdev 13 dbdev_hash = hashlib.sha256(bytes_dbdev).hexdigest() 14 _MAGIC_TRACE_KEY_IS_SET = dbdev_hash == 'e27eae61e774b19f4053361e523c771a92e838026da42c60e6b097d9cb2bc825' 15 return _MAGIC_TRACE_KEY_IS_SET 

This method is repeatedly called from different places in the code to check if the magic key for tracing is installed. I tried to crack the SHA-256 hash with John the Ripper brute force, but the brute force was taking too long, but I couldn’t reduce the number of options because there was no guesswork about the content. In Dropbox, developers can have a specific hard-coded development key, which they install if necessary, by activating the client's mode with a “magic key” for tracing.

It annoyed me because I wanted to find a quick and easy way to launch Dropbox with this key set for tracing. Therefore, I wrote a marshaling procedure that generates pyc encrypted files according to Dropbox encryption. Thus, I was able to enter my own code or simply replace the above hash. This code in the repository on Github is in the patchzip.py file. As a result, the hash is replaced by the SHA-256 hash ANVILVENTURES . Then the code object is re-encrypted and placed in a zip where all obfuscated code is stored. This allows you to do the following:

  $ DBDEV = ANVILVENTURES;  export DBDEV;
     $ ~ / .dropbox-dist / dropbox-lnx_64-71.4.108 / dropbox 

Now all debugging functions are displayed when you right-click on the Dropbox icon in the system tray.



Studying further decompiled sources, in the dropbox/webdebugger/server.py I found a method called is_enabled . It looks like it checks to enable the built-in web debugger. First of all, he checks the aforementioned magic key. Since we replaced the SHA-256 hash, we can simply set the value of ANVILVENTURES . The second part in lines 201 and 202 checks whether there is an environment variable named DB<x> , which has x equal to the SHA-256 hash. The value of the environment sets cookies with a time limit, as we have already seen.

  191 @classmethod 192 def is_enabled(cls): 193 if cls._magic_key_set: 194 return cls._magic_key_set 195 else: 196 cls._magic_key_set = False 197 if not magic_trace_key_is_set(): 198 return False 199 for var in os.environ: 200 if var.startswith('DB'): 201 var_hash = hashlib.sha256(make_bytes(var[2:])).hexdigest() 202 if var_hash == '5df50a9c69f00ac71f873d02ff14f3b86e39600312c0b603cbb76b8b8a433d3ff0757214287b25fb01' and is_valid_time_limited_cookie(os.environ[var]): 203 cls._magic_key_set = True 204 return True 205 206 return False 

Using the exact same technique, replacing this hash with SHA-256, which was used before, we can now change the previously written setenv script to something like this:

  $ cat setenv.py … c = generate_time_cookie() output_env("DBDEV", "ANVILVENTURES") output_env("DBANVILVENTURES", c) $ python3 setenv.py DBDEV=ANVILVENTURES; export DBDEV; DBANVILVENTURES=38b285c4034a67; export DBANVILVENTURES $ eval `python3 setenv.py` $ ~/.dropbox-dist/dropbox-lnx_64-71.4.108/dropbox 

As you can see, after the client starts, a new TCP port is opened for listening. It will not open if the environment variables are not set correctly.

  $ netstat --tcp -lnp |  grep dropbox
     tcp 0 0 127.0.0.1:4242 0.0.0.0:* LISTEN 1517 / dropbox 

Further in the code, you can find the WebSocket interface in the webpdb.pyc file. This is a wrapper for standard Python utilities pdb . Access to the interface is provided via an HTTP server on this port. Let's install the websocket client and test it:

  $ websocat -t ws: //127.0.0.1: 4242 / pdb
     --Return--
    
     > /home/gvb/dropbox/webdebugger/webpdb.pyc(101)run()->None
     >
     (Pdb) from build_number.environment import magic_trace_key_is_set as ms
     (Pdb) ms ()
     True 

Thus, we now have a full-fledged debugger in the client, which in all other respects works as before. We can execute arbitrary Python code, we managed to turn on the internal debugging menu and trace functions. All this will greatly help in the further analysis of the Dropbox client.

Conclusion


We managed to do the reverse development of Dropbox, write decryption and code injection tools that work with current Dropbox clients based on Python 3.6. We did reverse engineering of separate hidden functions and activated them. Obviously, the debugger will really help in further hacking. Especially with a number of files that can not be successfully decompiled due to the shortcomings of decompyle6.

Code


The code can be found on Github . Instructions for use there. This repository also contains my old code, written in 2011. It should work with only a few modifications, provided that someone has older versions of Dropbox based on Python 2.7.

The repository also contains scripts for broadcasting opcodes, instructions for setting the Dropbox environment variables and everything you need to change the zip file.

Acknowledgments


Thanks to Brian from Anvil Ventures for reviewing my code. Work on this code lasted for several years, from time to time I updated it, introduced new methods and rewrote fragments in order to restore its work on new versions of Dropbox.

As mentioned earlier, an excellent starting point for reverse engineering applications in Python is the work of Rich Smith, Florian Ledoux and Nicolas Raff, as well as Hagen Fritsch. Especially their work is relevant for the reverse development of one of the largest Python applications in the world - the Dropbox client.

It should be noted that the decompilation of the Python code was greatly advanced thanks to the uncompyle6 project headed by R. Bernstein. This decompiler builds and improves many different Python decompilers.

Thanks also to colleagues Brian, Austin, Stefan and Chris for reviewing this article.

Source: https://habr.com/ru/post/452276/


All Articles