MIT course "Computer Systems Security". Lecture 11: "Ur / Web programming language", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

Computer Systems Security is a course on the development and implementation of secure computer systems. Lectures cover threat models, attacks that compromise security, and security methods based on the latest scientific work. Topics include operating system (OS) security, capabilities, information flow control, language security, network protocols, hardware protection and security in web applications.

Lecture 1: "Introduction: threat models" Part 1 / Part 2 / Part 3
Lecture 2: "Control of hacker attacks" Part 1 / Part 2 / Part 3
Lecture 3: "Buffer overflow: exploits and protection" Part 1 / Part 2 / Part 3
Lecture 4: "Separation of privileges" Part 1 / Part 2 / Part 3
Lecture 5: "Where Security Errors Come From" Part 1 / Part 2
Lecture 6: "Opportunities" Part 1 / Part 2 / Part 3
Lecture 7: "Sandbox Native Client" Part 1 / Part 2 / Part 3
Lecture 8: "Model of network security" Part 1 / Part 2 / Part 3
Lecture 9: "Web Application Security" Part 1 / Part 2 / Part 3
Lecture 10: "Symbolic execution" Part 1 / Part 2 / Part 3
Lecture 11: "Ur / Web programming language" Part 1 / Part 2 / Part 3

A little later, I will tell you about the fake cross-site requests. I think the lecture summary explains why, in our case, cross-site scripting does not work. The reason is that whenever you create a “piece” of syntax, this object, tree, and different parts of this tree are not just strings.
')

You cannot accidentally turn a user's string into a structure tree, this does not happen automatically, because it is difficult to write such a translator. But you can try writing a translator for Ur / Web. Soon I will give an example that will help reduce your concerns about this. I want to show you what this syntax actually turns into in the compiler.

It would seem that we could just add double quotes around the HTML to return to the normal world. One may wonder why it is so important to skip double quotes, posting XML instead.

You can take my word for it that this is equivalent code for what it does. Here is the built-in function tag that creates the tree node of the HTML document. Next, I place arguments that express the CSS style on this node. Here, in fact, nothing happens, so there are many different ways to say "None", it does not require any attributes.

Next, I put the body tag, this is another thing from the standard library. All standard tags are in the library functions of the first class.

Next, we need to put the text “Hello World” in the body, so we call the cdata function, where cdata is the XML word for character data or just a string constant, and we can place text here. This should give us the same result as before. See if it works.

Now I will be back on the page. We see the same thing as before, so this is the same thing that the function did at the beginning.

This is not just building a string. This triggers a series of operations that are designed so that only they allow you to create valid HTML, and they can never implicitly interpret the line as code instead of just placing content on the page that should be there.

Now I will try to do something less complicated that could potentially cause concern. Let's decide that we are really happy to see the world, so select the word “hello” in bold and compile it again.

You see what happened on the page - the word did not become bold, because the compiler shows how the text is interpreted instead of markup. This is a representation of HTML syntax as a function that builds the syntax without having the usual coding syntax conventions embedded in it. This function interprets everything the way you wanted to write, without thinking anything of it.

Thus, the cdata implementation does what is commonly called escaping, or "escape." But the programmer does not need to know that there is such a thing as escaping. You can think of this as a set of convenient functions for creating a tree of an object that describes a page.

I hear that you want to see the HTML code that is generated here. Well, it will not be the most exciting thing. I will try to increase it on the screen, but then it will not fit on one line.

Student: Considering that you are using XHTML, could you just use the path for the cdata character data instead of doing it manually?

Professor: probably, but it would require more XML knowledge from me than the ones I have. There was another good question about the javascript url. If we enable the JavaScript URL, we create a backdoor to automatically interpret strings as programs at run time. And it causes all sorts of problems.

Let's try to avoid this. I will switch back primarily to the shortened version, and I will do a few lines inside the body. And let's also put a link that tries to do something appropriate. Here we leave room for error messages.

Next, run the compiler and see how it works.

Invalid URL, then JavaScript entry and the phrase "passed to bless", or "passed the blessing." Bless is built into the function that is the gatekeeper allowing the URL. By default, URLs are not allowed, so of course this option is not allowed. And in general, it’s a bad idea to write your own URL policy so you can create values representing JavaScript URLs. Because then any guarantees will be required, because these addresses may be invalid.

To make it a little clearer how this works, let me decompose this code into a separate function that calls the linker, which accepts a URL. Thus, the URL is a type, not a string. This is the type that denotes a URL that is explicitly permitted by your application policy.

I use curly brackets, as in some popular HTML template frameworks, to denote inserting some code from the host language inside the HTML that we create. And all this is done in such a way that the type is checked statically. So the system will check: “yes, this is the place that owns the URL, and it says that this is indeed the URL, so everything is fine.”

And then I can explicitly place the call to bless, saying: “let's just call the linker function here based on the results of the“ blessing ”of this URL”. After that, we should get the same error message as before.

Unfortunately, I can’t run it for you and wait for execution until it crashes, but I can say that it will definitely fail, because I intentionally made a compiler error. This URL will not be accepted by the URL policy.

If I missed this bless call, it would be a more serious compile-time error, because you have the string and the URL you need, and they are of different types.

Let's make it more interesting. I am going to open the configuration file for this demo. It is rather short, at least if you look at any framework for Java web applications. They have these giant XML files for configuration, so we have a lot better.

We can add a rule that says that everything that is in Wikipedia is allowed, and then put in the body of the URL - the address of Wikipedia.

Now go to the page and click Please Click.

Here is what we got: Wikipedia address not found.

So the main idea is to have an abstract type of URL, like how you could have an abstract type of hash tables that encode variants of what a hash table looks like and prevents the code from getting into the hash table array. We can do the same for URLs. With this bless function, the system ensures that every value of this type at some point passes a corresponding check.

For example, with this policy, we know that we will never have a JavaScript URL, so you can safely take the value of the URL and use it as a link. This will not break the basic abstractions of the language.

Student: Is it possible to use “clean” JavaScript by inserting it into the body string?

Professor: both yes and no. Instead of JavaScript, you insert Ur / Web code that performs some tasks. Now I will type the command:

return <xml><body onload = {alert “LOADED”}>

And you will see what happened - the interpreter placed a window on the screen with the words “Loaded” - “Loaded”.

Attempting to interpret the JavaScript code in a string form as a program would turn out to be a disaster. You can see that we can put the code of the same programming language with which you work, already limited by these curly braces. And then it is automatically compiled into javascript to run on the client side.

I note that new versions of browsers can avoid character interpretation errors, but some older browsers are capable of confusing something. In any case, all character elements will be interpreted as UTF-8 if they are included in the document. If there is any problem with a different encoding, then this encoding should not be used here.

Student: The compiler checks if the string contains a valid URL. But if you calculate a string at runtime, does bless check at runtime, is this string valid or not?

Professor: Let's create a form to test this statement and place it here. We enter our URL in the URL text field, and then insert the submit submit button.

Clicking on it will be followed by a call to the linker function with the entry of one value for each field in the form. In this case, there is only one field called “URL”, so the linker will process the record containing the URL as a string type. And then we will try to apply the bless function to it and see if it works.

You see an example of an error message when typing the wrong type of URL, one of those things that will not make any sense if you are not familiar with Haskell. I forgot to insert the return function here. At least now it looks more like a Java program. And I also forgot to say that now this is a full page. Therefore, we cannot use the a tag until we are inside the body tag.

Now we run the compiler, go to our page, click the Please Click button, enter some abstruse non-existent address.

Then click Submit Query - “Submit request” and get an error message - the address of this type is not allowed.

If we enter the URL of the correct view, as shown on the next screen, and then click Submit Query, no error message appears.

I think the answer to your question turned out to be long and not too exciting.

Student: Are there more stringent conditions for URLs other than prohibiting the use of JavaScript?
Professor: At present, more stringent restrictions are just constants and prefixes. But you can also create your own prohibitory rules, and they will work in the order in which you write them.

Student: it turns out that if you adhere to the prohibition of JavaScript, but put a line break in the middle of the word "JavaScript", the compiler can interpret it ...

Professor: yes, that would be too bad. That is why it’s good to stick to the white list approach instead of using the black list approach. You probably want all the rules to start with a specific protocol, such as HTTP, and only allow what falls into your approved protocol suite. I recommend doing just that.

Student: for many sites you can allow users to exchange links, in which case you need to allow links everywhere.

Professor: you can resolve links if you want your users to share javascript links or, I don't know, flash links, or whatever is allowed. You see, you can create a “white list” of all HTTP, HTTPS, URLs and thus ensure the safe operation of most sites. This approach is only slightly weaker than the permission to use only certain URLs. But at least you can completely exclude the possibility of automatic execution of the string as a program.

Let me give you one example of outlines, which is an example of a simple system of chat rooms represented in the database. The user can click on the link to go to the room and then send a message. This is the first of several options for the scheme.

First, I’ll note that I’m going to recompile this. And then magically, all declared database tables will be added to the database, and we can start using the application. But first we have to add some chat rooms. So, let's open our interface to the demo database and insert the values “one” and “two” into the room table.

Now they have appeared here.

Now we enter the first chat room and can entertain ourselves all day by sending lines of text, for example, this is the first line. It will be more interesting to try to send HTML, and it is immediately processed. This is an example of the main functionality of the program.

Once again, let's quickly go through how it works, so we have these two SQL tables — table room and table message, which are simply declared in this first class within the programming language. And we give the schema of each table. And then, when we try to access these tables, the compiler makes sure that they are accessed in accordance with the promising typing scheme.

So, we have a table of rooms, where each room is a record consisting of an identifier ID, which is an integer, and a title Title, which is a string. This is the type of view in which we just created the records. I just created some rooms in the SQL console. We also have a notice that each message belongs to a particular room, the time when it was created, and the text that is the content of the message.
Now let me quickly go to the main function.

We run a SQL query — you can see an example of the SQL syntax embedded in Ur / Web. I do not want to go to the function call through this extension from the standard library. It will be rather verbose, it suffices to recall my words about the fact that in the standard library there are ways to call functions that are valid ways to build an SQL query.

And these functions have types that make them print requests for you, and not just ensure that the syntax is valid. This code simply iterates through all the lines that came out of this request, and generates parts of the HTML code for each of them.

In particular, we are going to put the result of the query in the Title header field and convert it to HTML with a notation that includes curly brackets. Brackets additionally indicate that this is not a real piece of HTML, but please convert it to me in the standard way. So we can do with strings and integers and all the data of other types.

Student: if it contained malicious HTML code or something else, would it be filtered out?

Professor: yes, it would. In Ur / Web, you can just think of it as building a tree. This is a node that is decrypted as some text. Obviously, the text can do nothing.

Student: So, if this header were under the control of the user, and someone made a chat with an Alert header, would it not be JavaScript?

Professor: it will not be automatically treated as javascript, html or anything else. It will be perceived by the program as plain text.

So back to our image on the screen. We have this title Title, let's frame it with tags. And instead of href, the usual way to make a link in HTML, we use the link attribute, which is a kind of Ur / Web pseudo-attribute, which takes not the URL as an argument, but mostly Ur / Web expressions. The point is that when you click on this link, this expression is launched to create a new page that should be displayed.
In this case, we make a call to the chat function, which is defined here on the next screen, that's what it is.

I will not go into details. But we still have several SQL queries that use different standard library functions for different ways of using the requested results.

We generate this HTML page and say that you are in a chat with such a title, we have a form where the user can enter text. This is the form that I used to demonstrate the program a few minutes ago. The form submit button has this Add attribute containing say, which is the name of the Ur / Web function. Therefore, when we submit a form, we call this function.

Running a few more SQL inserts new rows into the table. We automatically jump from the chat room ID to the text fields that came here from the form, and they are automatically hidden as needed. But then again, in Ur / Web you don’t need to think about “escaping” a function in this way. Because it is just a syntax for building a tree, not for a string. Thus, there is no way for strange things to happen with the parsing, which you do not expect from the chosen way of interpreting the syntax.

So, the fact that we have a widget in this form in the form of a graphical GUI, and this is a text field, the compiler concludes that the record that is obtained by filling in the textbox form should have one element, called the "text" of the string type . This form encoding and the rules for typing text in it are not built into the language, but are taken from the Ur express library, which actually controls these forms, determining the types of valid functions.
If you have no more questions about this part of the program, I will move on to the next step. I’m going to use a way to get forced encapsulation of different parts of an application that Ur / Web supports and which other languages rarely support. I'm going to occupy this room. I'm going to take a few definitions and put them in a module that encapsulates some of them as private ones. In particular, the database tables will be closed, so that no one can directly access them.

You can access them only with the help of a set of methods that we provide. One method runs inside a transaction and creates a list of records with ID and title fields for the available chat rooms.

Next we just open this chat operation. And the only thing I did here was to enter a name for the concept ID — type ID. So I’m not just saying that ID is an integer, I’m saying that this is a new type.

The only way for the outside world to connect to the chat is to get a list of all the rooms, and the only way that the outside world can ever use it is to call the chat function in it. Say, this is a certain abstract type of hash tables inside the class of hash tables, where the details explaining what IDs are and how they are produced inside are private for this module, and the client code that calls this module should not use them.

Now I will move all this syntax down and locate it inside the module, so that by default it is not exposed to the rest of the code. Next, I implement this room method. We already have the opportunity to organize a chat. But we can implement rooms in a simpler way using another standard library function to interpret the query in the form used.
Let's just select everything from the list of rooms, sorted by name. As usual, this query is a data type verified for us. And the system determines: “OK, this expression will generate a list of records that match the type declared in the signature of this module.” Therefore, now outside of this module no other code can mention the room table or message table.

Thus, at least from the point of view of this application, we can apply the invariance required of it. We can even hide secrets inside the module so that there are no security problems, if some other part of the code can get them.

Student: Could some other part of the code also implement this room method?

Professor: It would be a completely different table. In fact, we can do this by inserting this fragment of 4 lines into another module.

Then we can do whatever we want with this table. I will compile it, maybe in 30 seconds, and we will see what happens. But in fact, this is a different table, as if you had the same private name, but for two different Java classes.

So, you assume that inside this module there is an abstract type called room, which contains the ID and the title Title. It is not right. Chat takes room parameters as input. When we call the chat function, it will be called via the URL. The ID and Title Title is passed outside the URL representation of the function call. We only need the ID in order to implement this function. So when we call a function, we actually call the URL.
It would be wasteful from the point of view of the use of space and would look rude to the user if the title were passed as an additional argument when calling the chat via the URL. Does it make sense? Let's look at the URL bar on this slide.

The channel ID for which we are moving is automatically serialized to the URL at the end of the line. And if we passed a record that contains the ID and the title Title, this title would also be serialized, which is at least a bit illogical.

54:10

MIT course "Computer Systems Security". Lecture 11: Ur / Web Programming Language, part 3

Full version of the course is available here .

Thank you for staying with us. Do you like our articles? Want to see more interesting materials? Support us by placing an order or recommending to friends, 30% discount for Habr users on a unique analogue of the entry-level servers that we invented for you: The whole truth about VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps from $ 20 or how to share the server? (Options are available with RAID1 and RAID10, up to 24 cores and up to 40GB DDR4).

VPS (KVM) E5-2650 v4 (6 Cores) 10GB DDR4 240GB SSD 1Gbps until December for free if you pay for a period of six months, you can order here .

Dell R730xd 2 times cheaper? Only we have 2 x Intel Dodeca-Core Xeon E5-2650v4 128GB DDR4 6x480GB SSD 1Gbps 100 TV from $ 249 in the Netherlands and the USA! Read about How to build an infrastructure building. class c using servers Dell R730xd E5-2650 v4 worth 9000 euros for a penny?

Source: https://habr.com/ru/post/425999/

All Articles

MIT course "Computer Systems Security". Lecture 11: "Ur / Web programming language", part 2

Massachusetts Institute of Technology. Lecture course # 6.858. "Security of computer systems". Nikolai Zeldovich, James Mykens. year 2014

More articles: