MIT course "Computer Systems Security". Lecture 21: "Tracking data", part 1

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3
Lecture 3: "Buffer overflow: exploits and protection" Part 1 / Part 2 / Part 3
Lecture 4: "Separation of privileges" Part 1 / Part 2 / Part 3
Lecture 5: "Where Security Errors Come From" Part 1 / Part 2
Lecture 6: "Opportunities" Part 1 / Part 2 / Part 3
Lecture 7: "Sandbox Native Client" Part 1 / Part 2 / Part 3
Lecture 8: "Model of network security" Part 1 / Part 2 / Part 3
Lecture 9: "Web Application Security" Part 1 / Part 2 / Part 3
Lecture 10: "Symbolic execution" Part 1 / Part 2 / Part 3
Lecture 11: "Ur / Web programming language" Part 1 / Part 2 / Part 3
Lecture 12: "Network Security" Part 1 / Part 2 / Part 3
Lecture 13: "Network Protocols" Part 1 / Part 2 / Part 3
Lecture 14: "SSL and HTTPS" Part 1 / Part 2 / Part 3
Lecture 15: "Medical Software" Part 1 / Part 2 / Part 3
Lecture 16: "Attacks through the side channel" Part 1 / Part 2 / Part 3
Lecture 17: "User Authentication" Part 1 / Part 2 / Part 3
Lecture 18: "Private Internet browsing" Part 1 / Part 2 / Part 3
Lecture 19: "Anonymous Networks" Part 1 / Part 2 / Part 3
Lecture 20: “Mobile Phone Security” Part 1 / Part 2 / Part 3
Lecture 21: “Data Tracking” Part 1 / Part 2 / Part 3

James Mickens: great, let's get started. Thank you for coming to the lecture on this special day before Thanksgiving. I am glad you guys are so committed to computer security and I am sure that you will be in demand on the labor market. Feel free to refer to me as a source of recommendations. Today we will talk about tracking Taint-tracking infection, in particular, about a system called TaintDroid, which provides the execution of this type of analysis of information flows in the context of Android-powered smartphones.
')

The main problem raised in the lecture article is the fact that applications can retrieve data. The idea is that your phone contains a lot of confidential information - a list of contacts, your phone number, email address and all that. If the operating system and the phone itself are not careful, the malicious application may be able to extract some of this information and send it back to its home server, and the server will be able to use this information for all types of ill-fated things that we will talk about later.

Globally, the TaintDroid article offers a solution: keep track of sensitive data as it passes through the system, and, in fact, stop it before it is transmitted over the network. In other words, we must prevent the possibility of transferring data as an argument to network system calls. Apparently, if we can do this, then we can essentially stop the leak right at the moment when it starts to happen.

You might think, why is Android’s traditional permissions not enough to prevent this type of data from being extracted? The reason is that these permissions do not have the correct grammar to describe the type of attack that we are trying to prevent. Permissions for Android usually deal with application rights to write or read anything from a particular device. But now we are talking about what is on a different semantic level. Even if an application has been granted the right to read information or write data to a device such as a network, it can be dangerous to allow an application to read or write certain sensitive data to a device that it has permission to interact with.

In other words, using traditional Android security policies, it is difficult to talk about specific data types. It is much easier to talk about whether the application is accessing the device or not. Perhaps we can solve this problem using an alternative solution, I will mark it with an asterisk.

This alternative solution is to never install applications that can read sensitive data and / or access the network. At first glance, it seems that the problem has been fixed. Because if an application cannot do both of these things at the same time, it will either not be able to access sensitive data, or it can read them, but will not be able to send them over the network. What do you think is the catch?

Everyone is already thinking about holiday turkey, I can see it in your eyes. Well, the main reason why this is a bad idea is that this measure can break the work of many legitimate applications. After all, there are many programs, for example, email clients, which in fact should be able to read some confidential data and send it over the network.

If we just say that we are going to prevent this kind of activity, then we will actually prohibit the work of many applications on the phone, which users probably will not like.
There is another problem here - even if we implemented this solution, it will not prevent data leakage through a bunch of different third-party channel mechanisms. For example, in previous lectures we considered that the browser’s cache, for example, may contribute to the leakage of information about the user’s visit to a particular site. Therefore, even with the implementation of such a security policy, we will not be able to control all third-party channels. A little later, we'll talk about third-party channels.

The proposed solution will not stop the collusion of applications when two applications can work together to break the security system. For example, what to do if one application does not have access to the network, but it can communicate with the second application that has it? After all, it is possible that the first application can use the IPC Android mechanisms to transfer sensitive data to an application that has network permissions, and this second application can upload this information to the server. But even if the applications are not in collusion, there may be some kind of trick when one application can force other applications to accidentally give out sensitive data.

It is possible that there is some flaw in the e-mail program, because of which it receives too many random messages from other components of the system. Then we could create a special intent intent to spoof the email program, and it would force the Gmail application to email something important outside the phone. So this alternative solution does not work well enough.

So, we are very concerned that confidential data is leaving the phone. Consider what in practice do malicious applications for Android. Are there any attacks in the real world that can be prevented by tracking Taint-tracking infections? The answer is yes. Malware is becoming an increasing problem for mobile phones. The first thing a malicious application can do is use your location or IMEI to advertise or impose services.

Malicious software can determine your physical location. For example, you are near the MIT campus, so you are a hungry student, so why don't you visit my snack bar on wheels, which is located very close?

IMEI is an integer representing the unique identifier of your phone. It can be used for your tracking in different places, especially in those where you would not want to "light up". Thus, in nature there are malicious programs that can do such things.

The second thing malware does is steal your personal data. They may try to steal your phone number or contact list and try to upload these things to a remote server. It may be necessary to impersonate you, for example, in a message that will later be used to send spam.

Perhaps the worst thing that malware can do, at least for me, is turn your phone into a bot.

This, of course, is a problem that our parents did not have to face. Modern phones are so powerful that they can be used to send spam. There are many malware targeting specific corporate environments that do just that. Once in your phone, they begin to use it as part of the spamming network.

Student: is it malware that specifically targets the Android OS, or is it just a typical application? If this is a typical application, then perhaps we would be able to secure it with permissions?

Professor: this is a very good question. There are both types of malware. As it turned out, it's pretty easy to get users to click on different buttons. I will give you an example that concerns not so much malware as careless behavior of people.
There is a popular game Angry Birds, you go to the App Store and look for it in the application search bar. The first in the search results you will be given the original game Angry Birds, and in the second line may be the application Angry Birdss, with two s at the end. And many people will prefer to download this second application, because it may cost less than the original version. Further, during installation, this application will write that after installation you will allow it to do this and that, and you will say: “Of course, no problem!”, Because you received the desired Angry Birds for a mere penny. After this "boom" - and you are on the hook of a hacker!

But you are absolutely right when you assume that if the Android security model is correct, the installation of malware will depend entirely on the folly or naivety of users who give it access to the network, for example, when your game Tic-Tac-Toe should not have access to network.

So you can turn your phone into a bot. This is terrible for many reasons, not only because your phone is sending spam, but also because you may be paying for the data of all those emails that are sent from your phone. In addition, the battery is rapidly discharged, because your phone is constantly busy sending spam.

There are applications that will use your personal information to cause harm. Especially bad in this bot is that it can really look at your contact list and send spam on your behalf to people who know you. At the same time, the likelihood that they will click on something malicious in this letter increases many times over.

So, preventing the extraction of information is a good thing, but it does not prevent the possibility of hacking itself. There are mechanisms to which we must pay attention first of all, because they prevent an attacker from seizing your smartphone by educating users what they can click on and what they shouldn’t click on in any way.

Thus, taint-tracking by itself is not a sufficient solution to prevent a situation that threatens to seize your phone.

Let's take a look at how TaintDroid works. As I mentioned, TaintDroid will keep track of all your confidential information as it spreads through the system. So, TaintDroid distinguishes what is called the "information sources" Information sources and "information sinks" Information sinks. Sources of information generate confidential data. Usually these are sensors - GPS, accelerometer and the like. It can be your contact list, IMEI, everything that can connect you, a particular user, with your real phone. These are devices that generate infectious information, called sources of infected data - Taint source.

In this case, information sinks are places where infected data should not leak. In the case of TaintDroid, the main absorber is the network. Later we will talk about the fact that you can imagine more places where information flows away, but the network has a special place in TaintDroid. In a more general-purpose system than a phone, there may be other Information sinks, but TaintDroid is designed to prevent leaks to the network.

In TaintDroid, a 32-bit bitvector is used to represent Taint infection. This means that you can have no more than 32 separate sources of infection.

Therefore, each confidential information will have a unit located in a certain position if it has been infected by a particular source of infection. For example, it was obtained from GPS data, from something from your contact list, and so on and so forth.

Interestingly, the 32 sources of infection are actually not that many. The question is whether this number is large enough for this particular system and whether it is large enough for general systems suffering from information leaks. In the particular case of TaintDroid, 32 sources of infection are a reasonable value, because this problem concerns a limited flow of information.

Considering all the sensors that are present on your phone, confidential databases and the like, 32 seems to be the correct value in terms of storing these infected flags. As we will see from the implementation of this system, 32 is actually a very convenient number, because it corresponds to 32 bits, an integer with which you can effectively construct these flags.

However, as we will discuss later, if you want to provide programmers with the ability to control information leakage, that is, specify your own sources of infection and your own types of leaks, then 32 bits may not be enough. In this case, you should think about including a more complex runtime support to denote more space.

Roughly speaking, when you look at how an infection flows through the system, in a general sense, it happens from right to left. I will give a simple example. If you have an operator, for example, you declare an integer variable that is equal to the latitude of your location: Int lat = gps.getLat (), then essentially the thing to the right of the equal sign generates a value that has some associated with her infection.

So a specific flag will be set that says: “hey, this value that I return comes from a confidential source”! So the infection will come from here, on the right side, and will go here, to the left, to infect this part of lat. This is how it looks in the eyes of a human developer who writes source code. However, the Dalvik virtual machine uses this register format at a lower level to create programs, and this is actually how the taint semantics are implemented in reality.

In the table to one of the lecture articles there is a large list of commands describing how infection affects these types of commands. For example, you can imagine that you have a move-op operation that points to the destination dst and the source srs. In the Dalvik virtual machine, on an abstract computing engine, this can be considered as registers. As I said, the infection goes from the right side to the left side, so in this case, when the Dalvik interpreter executes the instructions on the right side, it considers the taint label of the sourse parameter and assigns it to the dst parameter.

Suppose that we have another instruction in the form of a binary-op binary operation that performs something like an addition. We have one dst destination and two sources: srs0 and srs1. In this case, when the Dalvik interpreter processes this instruction, it takes taint from both sources, combines them and then assigns this union to the destination dst.

It is quite simple. The table shows the different types of instructions that you will see, but as a first approximation, these are the most common ways in which an infection spreads through the system. Let's look at particularly interesting cases that are mentioned in the article. One such special case is associated with arrays.

Suppose you have a command char c, which assigns a certain value C. In this case, the program declares a certain array of char upper [] which will contain capital letters "A", "B", "C": char upper [] = ["A "," B "," C "]

A very common thing in code is to index into an array like this, using C directly, because, as we all know, Kernighan and Richie teach that mostly characters are integers. So you can imagine that there is some char upperC code that says that the capital versions of these characters “A”, “B”, “C” correspond to specific indices in this table: char upperC = upper [C]

This raises the question of which infection should get upperC in this case. It seems that in the previous cases we had everything simple, but in this case we have a lot of things happening. We have an array ["A", "B", "C"], which may have a type of infection and we have this symbol C, which may also have its own type of infection. Dalvik , binary-op. upperC [C] .

, upperC - upper [ ]. - [C]. , , upperC , .

: , taint move op binary op?

: move op. , srs… -, . , , , , , taint. , , .
, srs , , . srs : « , 2 , srs». .

– , taint. , srs0 srs1, taint, :

\

dst :

\

, , 32- , , . , . taints, , .
, , , binary-op. upperC [C] [«», «», «»]. TaintDroid , taint . , . , 32- , «» , .

, taint. — , , ? , taint , , . , , , , . - , .

, . , , - , , . , , – , , . .

, – , , Native methods, - . Native- . , Dalvik , system.arraycopy(), - , C C++. Native method, .

- JNI. JNI, Java Native Interface — C C++ Java. , Java , Java. x86, ARM , .

- taint , Dalvik. Java-, C C++ . , Native-, TaintDroid , Java.

\

, « », , taint. , – . - , . , Dalvik , system.arraycopy(), , taint. arraycopy() : « , , , , ».

? , , . , , Dalvik , , , .

- JNI , . , , , C C++, .

, , . , - , , , . .

26:25

MIT course "Computer Systems Security". 21: « », 2

.

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps until January free of charge if you pay for a period of six months, you can order here .

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/433376/

All Articles

MIT course "Computer Systems Security". Lecture 21: "Tracking data", part 1

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

More articles: