“Distributed Information Flow Control” / “Distributed Information Flow Control”

Recently, at the request of my supervisor, I did a compilation and review for my group of some current topics in the field of OS and system security: automated trust negotiation (automatic “discussion” of access rights - I don’t know how to translate correctly) and information flow control ). I wanted to post this compilation, but to my surprise I found, in my opinion, unreasonably little information about DIFC (distributed information flow control / distributed control of infromation flows) in .RU and therefore decided to write this short article on DIFC.

Motivation
The only way to ensure the security and privacy of data in the system is considered to be authentification (answers the question: “Who said that?”) And authorization (“What does he have the right to do with this data”). Those. if the program needs access to some data, we actually have 2 options: refuse or believe. If we do not trust this program, then we lose the opportunity to work with it and possibly lose important functionality. If we decide that we trust the program (and / or its developers), then the program actually becomes the “absolute master” of this information (or copy). Such a principle in the literature is called All-Or-Nothing - “All or nothing”.

Naturally, this principle is not flexible enough and is also the main cause of many vulnerabilities in systems, such as buffer overflow. In general, this principle does not allow creating more interesting applications where access rights are not limited to traditional ones: “no access”, “read only” and “read / write”. It turns out that there are systems that allow much more flexible differentiation of access rights to data - systems that support information flow management. The most important feature of these systems is that they monitor the data throughout their life cycle in the system. Recall that traditionally the system is responsible only for the initial access to the data, for example, checking whether the program has access to the file, and what the program does after that is not interested in the system.
')
Classic example. Suppose there are 2 users in the system, Alice and Bob. They want to make an appointment, but so as not to reveal too much information about their schedule of the week. Is it possible in a multi-user Linux / Unix / Windows system to write a program so that it has simultaneous access to both Alice and Bob’s calendars and guarantees the confidentiality of both users of the system?

The easiest way is to ask the “superuser” to write such a program, or at least correctly assign the rights to an already existing solution. But this path creates at least 2 problems:

1. There is no guarantee that the program does not contain logical errors and, for example, does not copy Alice's data somewhere else (or the admin will assign rights incorrectly).
2. It is necessary to trust 100% of the “super-user” and, moreover, such a process is non-interactive, i.e. wait for the admin to write such a program or set rights.

The solution of the first problem is carried out with the help of systems with information management support.

In general, systems with support for managing information flows are conventionally divided into 2 categories: centralized and distributed (decentralized). By centralized include all known SELinux and AppArmor . In the same article, I will try to talk about decentralized systems, using the example of the research OS (which is therefore completely unsuitable for real use, unfortunately). had some experience of “communication” with her. Decentralized systems can get rid of the second problem - dependence on the superuser.

(Distributed) Information Flow Control
In short, the idea of controlling the flow of information is trivial and is to track how data “flow” in the system from the sender to the recipient. The main task of the system is to prevent unauthorized data leakage from it. In general, no program (except “privileged”) can have simultaneous (in the context of program life) access to private data and any “information sink” (sink), such as a monitor, printer, socket (AF_INET). Those. if the program once read my personal files, then the system will not allow this program to have access to the network.

In order to make the data private, it is necessary to explicitly indicate this, for example, using special flags / tags. This is the main difference between centralized and distributed systems. In the first case, there is a special user - the “security manager”, who is responsible for correctly “tagging” data and determining the access rights of various programs to such data. For example, you can assign a “highly secret” tag to files with your passwords or personal income information and allow access to it only for Vim / Emacs without rights (1) export this data anywhere and (2) remove these tags. Thus, even if your text editors are compromised, the system (assuming that the system itself is safe and working without errors) will not allow you to save these files anywhere in the system (/ tmp) with other more allowing tags and send them in any way in Internet. I have not worked with SELinux, so I refer you to the official manuals for further information.

In distributed systems, any program / entity can create its own tags, assign rights and give access to its data to other programs.

In Flume OS, you can create a tag to access some personal data. And you have a choice. You can give to the public access the right to assign this tag and / or to delete it. Suppose that we created the tag1 tag and gave the open access right {tag1 +}, then any program can place this tag in its own set of tags. If we create the file F and associate it with the tag1 tag, then any process p1 can include this tag1 tag in its tag set and after that it can read all the data tagged with tag1. However, since {tag1-} is not in the public domain, this process will not be able to remove tag1 from its tag set and can now only communicate with processes with a set of tags that are a superset of the same set of process p1.

In principle, the system should ensure that a process can send a message to another only if the recipient has at least the same set of tags (or even more) as the sender, and also that no process with a non-empty set has access to the flow of information (by induction mat it is proved that a system with such conditions is safe). Disclaimer: in the original article a more formal wording and in addition to the security tag there is also the concept of integrity tag, but in this article I do not consider it.

Flume is one of the systems developed that ensures the “correctness” of the information flow. At the system level, Flume is Linux with a modified LSM system that intercepts the main system calls, stores information about tags, tags, and checks for correct data flow from one process to another.

Now back to the example with the calendar of Alice and Bob. In Flume OS, Alice will assign tag A for her calendar, and Bob for her own tag B. Alice will give to public access {A +}, and Bob {B-}. Bob runs the program with the label {A, B}, i.e. with access to both calendars. This program finds several convenient time intervals where neither Alice nor Bob is occupied, discards the B tag ({B-} in the public domain) and writes the result to the F file, which receives the automatic A tag ({A-} not in open access). Alice opens the F file because she is the owner of tag A and selects a specific date from the “suggested by Bob” list. Just in case, I remind you that {B +} is not publicly available, so Alice cannot read Bob’s calendar.

Conclusion
Unfortunately, I can’t cover all areas of applicability of DIFC ideas (even those that used problem motivation). There are quite a few excellent articles on this topic, starting with the most classic (Jiff) to fairly fresh HiStar / DStar or Resin. If you are interested in this topic, I can tell you in more detail / formally about, for example, the MIT Resin framework. At one time, I had the good fortune to talk with Barbara Liskov (who is probably one of the main authorities in this field) on the topic of controlling the flow of information, the applicability of these principles to other tasks and just got sick of this topic. There are several interesting "visions" of the development of this idea: W5 (world wide web without walls) or Fabric . But this is a completely different story ...

Source: https://habr.com/ru/post/86799/

All Articles

“Distributed Information Flow Control” / “Distributed Information Flow Control”

More articles: