LangSec

04 Nov 2014

Table of Contents	Subchapter
Langsec Intro/Faq
XSS
	What is XSS
	What protects against this correctly?
	Hooray! XSS is over! right? …or… I know this! get to the langsec!
SQL
	What is SQL injection
	Prepared Queries
	Let’s have some LangSec to solve this!
Buffer Overflows
	What is that
	Protecting against this problem
Overview and Proselytization

LangSec Intro/FAQ

Examples of security problems

XSS

What is XSS

A really big security problem in websites (one of the top 10) is called XSS (cross site scripting). You can learn what it is by playing the google xss game. It’s not just about making annoying popups say “XSS!” or “1” on peoples computers. It means an attacker can potentially trick the site into running a script that could do whatever they like on someone elses computer: it could copy their log-in cookies to compromise someones account or automatically make them add the attacker as a moderator on a forum, all kinds of nasty things!

What is done to protect against this

One way people try (and often fail) to protect against XSS is to take user input strings and try to filter out bad parts. For example if this was causing an alert to pop up on some website:

page.php?name=<script>alert(1);</script>

The developer might add a regex to strip out “script” and “alert”, but then the attacker might get through using:

page.php?name=<sCRiPt>eval('ale'+'rt(1)');</scRiPt>

There is a kind of arms race, the developer can secure against more and more attacks by making better regexes but the attacker can often make more and more obfuscated code that bypasses the filters.

What protects against this correctly?

In a website that takes ‘name’ as input and tries to display that in a context like this:

```

Welcome home

... ``` The solution is to quote the string so that any characters that might be interpreted as anything other than text, e.g. < and > should be replaced with escaped forms < and > More about escaping on [OWASP here](https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#RULE_.231_-_HTML_Escape_Before_Inserting_Untrusted_Data_into_HTML_Element_Content) There's a problem with this though! Consider a site that sets an attribute to a user input favorite color: ```

Welcome home

... ``` The `$name` variable is no longer an XSS vector but `$favorite_color` is, you can escape out of the attribute and create a new one, something like: `' onload='alert(1)`. The HTML language interprets the text differently since it's in a different context so you have to have a different escaping rule. Like [this](https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet#RULE_.232_-_Attribute_Escape_Before_Inserting_Untrusted_Data_into_HTML_Common_Attributes). So a secure version of this site might be like this: ```

Welcome home

... ``` ### Hooray! XSS is over! right? ...or... I know this! get to the langsec! ### At this point we actually know 100% why XSS happens and how to protect against it.. but it's still a problem. Why? There's a couple reasons: * People still filter rather than escape * People forget to escape or use the wrong type of escape by accident Every single time user input gets spliced into HTML we have to not only esacape it but choose the correct type of escape. That isn't going to happen when you collaborate on a program with a large large codebase worked on by many developers. **What would help improve security even more is if in the computer itself understood HTML (and this is really easy for a computer to do!) and figured out the correct type of escape to insert for you in the given context.** An example of this is the [html/template](http://golang.org/pkg/html/template/) package from the Go language. We would rewrite the PHP code using the following template: ``` <body bgcolor=>

Welcome home

... ``` and the compiler itself will process this, insert the correct escaping functions and all will be well. Please don't think that I'm advocating Go over PHP here: In the PHP example we just used plain text and string interpolation to build HTML - you could equally easily make the mistake of doing that in Go too. And you could just as well use a templating library that improve the security of your code by performing the analysis and escaping for you. *that* is what I recommend. ## SQL injection ## ### What is SQL injection ### SQL injection is another huge problem in websites - most sites use a database to store the model they represent and they often communicate with that data base by building up an SQL query as a string and passing that to the database. An example might be an application that does `SELECT * FROM users WHERE login='$name' AND password='$password';` with name and password user supplied input. Now an attacker might be able to provide as a password: ``` ' OR ''=' ``` and if the site is not properly escaping the input that might just get passed to the database like this: ``` `SELECT * FROM users WHERE login='admin' AND password='' OR ''='';` ``` which evaluates to TRUE and lets the user log in without a password. You can learn a lot more about SQL injection from [DEF CON 17 - Joseph McCray - Advanced SQL Injection](https://www.youtube.com/watch?v=-AkUutmXwUI) (also this is a really fun and awesome talk!) ### Prepared queries ### XXX haven't had to time to analyze the drupal SQL inj. Make notes on this. ### Let's have some LangSec to solve this! ### So exactly as before you need the compiler or programming language to understand SQL queries and perform the context aware quotation for you (numbers need different quotation than strings for example). An example of a language which supports this is [Ur/Web](http://www.impredicative.com/ur/) which lets you write code like this: ``` title <- oneRow (SELECT images.Title FROM images WHERE images.Id={[id]}); tagLis <- queryX (SELECT tags.Tag FROM tags WHERE tags.Id={[id]} ORDER BY tags.Tag) ``` and the compiler checks that your queries are syntactically valid, the tables have the correct form and it does the escaping correctly and such. ## Buffer Overflows ## ### What is that ### Another really dangerous security bug that is very common is buffer overflows. On normal computers when you compile a C program you get a stack frame for each function call and the stack frame holds an address so that it knows where to return to once it finishes its action. If there is a buffer overflow it can sometimes be used to overwrite that return address and jump to a different piece of code, it could even be a bit of code that an attacker provides. There's a good [textfile about that](http://phrack.org/issues/49/14.html). ### Protecting against this problem ### There's two solutions to this: * Do bounds checks on everything that could possibly go wrong. You can imagine how well this works. * Use a programming language that does bounds checks for you (e.g. Ocaml). This is much safer but sometimes people don't like to use languages like this because they are worried it will be slower. * Use a programming language that understands your code to some exent, and analyzes at compile time whether or not out of bounds array accesses will occur. That third point hasn't seem much adoption in practice but it seems to be the best of both worlds - you get no slow down from having to do redundant checks but you get the safety of them. These ideas have been designed and implemented before e.g. see [Eliminating Array Bound Checking Through Dependent Types](https://www.cs.bu.edu/~hwxi/academic/papers/pldi98.pdf) but as far as I know you can't use them in practice (maybe [ATS](http://www.ats-lang.org/) counts though). A really nice example of someone making a point of (3) is: https://github.com/hannesm/xmpp-client > OCaml is (compared to C) a game changer: no manual memory management, I stick to a pure (immutable and declarative) coding style (as usual, it can be improved). Some auditing and black box testing was done against our TLS stack. Security people were talking about how dangerous pidgin and libotr are with buffer overflows and stuff. Audits and analysis will help lots, but I think something that will help even more is that 'game changer'. ## Overview and Proselytization ## In summary there are various classes of bugs here that * Are seriously dangerous security problems that regularly get exploited and used against people * We understand the cause of them * We understand how to solve them In general this isn't enough to eliminate them! We continue to make little mistakes because people have to carefully protect against them during programming. The aspect of LangSec I'm describing here is to increase security of our programs by having the compilers understand the languages that get used in your code (not just the one host language that you use but the little DSLs that get embedded too). This is something computers are really good at, so it makes sense to leverage this. It's not a silver bullet that will solve all problems including world peace, but I believe it can significantly improve security as well as making programming easier. There are always going to security problems in software that the compiler can't recognize or protect against but there are some that it can help with so we need to change the way we work and tools we use a little bit to benefit from this. ## Random links ## * The OTR library for pidgin is not thought of as highly secure, for this reason it was rewritten in a memory safe language https://www.imperialviolet.org/2012/01/14/gootr.html