Let's see how String Analysis is used to improve the accuracy of JSA.
Consider the following real-world example:
var str = document.URL;
var url_check = str.indexOf('login.html');
if (url_check > -1) {
result = str.substring(0,url_check);
result = result + 'login.jsp' + str.substring((url_check+search_term.length), str.length);
document.URL = result;
}
Any standard taint analysis would argue that this code is vulnerable to Open Redirect, since the value of the untrusted variable "str" flows into the assignment "document.URL = result"; this presumably allows an attacker to control the URL target of a redirect operation, which means that phishing is possible.
So what could we do?
While we developed JSA, looking at examples such as this led us to the observation that URLs should be treated as being only partially untrusted. The host and path parts of a URL can't actually be manipulated by an attacker (because then the vulnerable page would not be invoked!):
Similarly, when looking at URLs as targets of a redirect operation, an attack is not interesting UNLESS it can manipulate the host of the target URL (to enable phishing):
Here's another key observation:
Because JSA works in a Hybrid setting, as a static analysis running in the context of a black-box scan, we know what is the actual URL of each page we analyze. We have concrete URL values that we could use to feed our string analysis, so that when we see operations in the code that take the URL as a string value and manipulate it, we can model the results of many of those actions precisely.
So the analysis works as follows: without losing precision, we can approximate the URL of the analyzed page to be something like this:
http://www.mysite.com/folder/page?.*
This basically means the prefix of the URL string is constant, while the suffix can be anything (because it is controlled by the attacker). We track all the string operations and manipulations and maintain a string pattern for the variables. If at the point of the sink, the string prefix is known, and contains a fixed hostname, we can rule out the possibility of an Open Redirect vulnerability.
This String Analysis technique had an amazing impact on JSA's accuracy:
These results are mind blowing! String Analysis eliminated nearly all the false positive results, allowing JSA to reach an amazing level of accuracy, where roughly 90% of the findings are in fact true positives. This is unheared of for a static analysis tool. It is made possible by the fact that JSA runs in a hybrid setting, feeding concrete URL values coming from black-box to the static analysis component. And of course due to the amazing technology of String Analysis, that is exclusive to IBM and a fruit of our software teams' collaboration with IBM Research.
Any thoughts or feedback? please leave a comment!
Comments