New Secure-Filters NPM Module for Simpler Output Sanitization

At GoInstant, we're fanatical about security. In our previous security blog post, we discussed some rules of thumb for validating inputs. In this post, we'd like to cover an equally important aspect of web security: output sanitization.

Generally speaking, output sanitization means preventing user-controlled input from creating bad code in our output. Output sanitization is hard, just like preventing SQL injection. So, we wanted to emulate well-known and adopted tactics of combating SQL injection, but apply that to web output sanitization instead. I worked closely with the fantastic team at Salesforce ProdSec to make sure that what we offer is really a superior solution.

The result (at the end of the post) is that we're releasing our open-source secure-filters npm module.

But first, let's dig into the problem we're trying to solve.

XSS and Input Validation

XSS (Cross-Site Scripting) vulnerabilities are where an attacker can "inject" arbitrary code (typically JavaScript) into a web page on your domain. A successful XSS attack can lead to CSRF (Cross-Site Request Forgery) vulnerabilities, which can either leak sensitive data or allow an attacker to effectively control your browser to issue REST API calls on the attacker's behalf.

Part of defending against XSS is to properly validate any input against patterns or lists that define what's allowed. Everything else should get denied or blocked. However, only doing input validation is insufficient to prevent XSS:

  • there are some inputs that require most, if not all, of the printable Unicode range
  • it's impractical to check and remove all types of XSS and injection exploits, especially those that haven't been discovered yet
  • and finally: what's bad for input isn't necessarily bad for output

Context is King

Since one can't universally validate all possible inputs, a second line of defense is necessary: output sanitization. Let's use a more technical definition of output sanitization: for each "slot" in a template, the context-appropriate filter is applied to prevent user-controlled content (i.e. inputs) from "breaking out" into the surrounding context.

It's generally well known that escaping HTML meta-characters into HTML entities prevents XSS attacks against content injected via template into HTML elements. This is good in the sense that it's well known, but sadly, it's often misinterpreted as being sufficient for all output contexts.

An All-too-typical Example

Say a developer starts with a template like so (EJS syntax):

<script>
  var displayName = "<%= displayName %>";
  $('#name').text(displayName);
</script>

The <%= %> operator HTML-escapes the named template variable. Developers are often surprised that for input "John, Roberts & Smith" they'll get an ugly ...

John, Roberts &amp; Smith

... (amp;) displayed on the page. In desperation to fix that, I've seen the following mistake made, which even passes code review on occasion:

<script>
  var displayName = <%- JSON.stringify(displayName) %>;
  $('#name').text(displayName);</script>

However, this approach is vulnerable to the display name </script><script>alert(1)//. The reason for this is that the HTML parser is run before the JavaScript parser. The contents of the JavaScript string breaks out of the string context, allowing the injection of arbitrary HTML and scripts. It's plausible that input validation would have caught this. But, it's entirely possible that input validation was neglected, or that business requirements require the punctuation!

The correct approach for this context -- a JavaScript variable slot inside an HTML script tag -- is to use backslash encoding plus encoding <> as \x3C\x3E to avoid tripping up the HTML parser. The nuances of the context and appropriate filter are difficult to remember, even within a development process that has strict code reviewers.

Making output sanitization easier

Remembering to match the output filter to the context is hard, especially if the templating language makes the syntax ugly or cumbersome. But our theory at GoInstant is that if you have a templating language that makes the slot contexts self-documenting, usage will become more consistent and less mistakes will be made.

This is consistent with preventing SQL-injection using placeholders (PostgreSQL even lets you self-document the type):

  UPDATE users SET userName = ?::string WHERE id = ?::int

At GoInstant, we are using EJS server-side. We came up with the following self-documenting approach to filtering:

<script>
  var displayName = "<%-: displayName |js%>";
  $('#name').text(displayName);
</script>

Here, the template slot context is self-documented with a standard EJS feature: filters. The filter in this case is denoted by the |js part. This is also visually similar to how types are defined in PostgreSQL placeholders: via a suffix.

Introducing secure-filters

GoInstant is announcing the general availability of the secure-filters npm module. This module gives you five contextual output filters and some slick integration with EJS.

  <style>
   .my-indicator {
     background-color: #<%-: bgColor |css%>;
   }
  </style>
  <div style="background-color: #<%-: bgColor |style%>"></div>

  <script>
    var config = <%-: config |jsObj%>;
    var userId = parseInt('<%-: userId |js%>',10);
  </script>

  <a href="/welcome/<%-: userId |uri%>">Welcome <%-: userName |html%></a>
  <br>
  <button onclick="activate('<%-: userId |jsAttr%>')">Click here to activate</a>

Since the filters are just regular JavaScript functions, you can use them anywhere, even with other templating syntaxes. We've even baked in support for AMD loading (e.g. RequireJS) and plain inclusion browser-side. This way, you can use it in client-side templates too! See the documentation on github for details.

Our Commitment to Security

We followed a philosophy similar to that of the OWASP ESAPI project: have a whitelist of "safe" characters, escape everything else. Where secure-filters differs is that we've taken a much more aggressive approach to what is whitelisted. This produces "uglier" markup, but because of the higher degree of specificity, there's less ambiguity for HTML/CSS/JavaScript parsers to trip over. For more details, check out the secure-filters documentation and source code.

In Conclusion

To prevent XSS, remember: "validate your input, sanitize your output." Managing XSS and other injection problems (like SQL-injection) on a large team means providing good tools for developers to use. We hope that secure-filters finds its way into your application stack for this reason. Go forth, and write safer web apps!