Vulnerability hunting with Semmle QL, part 1


Previously on this blog, we’ve talked about how MSRC automates the root cause analysis of vulnerabilities reported and found. After doing this, our next step is variant analysis: finding and investigating any variants of the vulnerability. It’s important that we find all such variants and patch them simultaneously, otherwise we bear the risk of these being exploited in the wild. In this post, I’d like to explain the automation we use in variant finding.

For the past year or so, we’ve been augmenting our manual code review processes with Semmle, a third-party static analysis environment. It compiles code to a relational database (the snapshot database – a combination of database and source code), which is queried using Semmle QL, a declarative, object-oriented query language designed for program analysis.

The basic workflow is that, after root cause analysis, we write queries to find code patterns that are semantically similar to the original vulnerability. Any results are triaged as usual and provided to our engineering teams for a fix to be developed. Also, the queries are placed in a central repository to be re-run periodically by MSRC and other security teams. This way, we can scale our variant finding over time and across multiple codebases.

In addition to variant analysis, we’ve been using QL proactively, in our security reviews of source code. This will be the topic of a future blog post. For now, let’s look at some real-world examples inspired by MSRC cases.

Incorrect integer overflow checks

This first case is a bug that’s straightforward to define, but would be tedious to find variants of in a large codebase.

The code below shows a common idiom for detecting overflow on the addition of unsigned integers:

if (x + y < x) {
    
// handle integer overflow
}

Unfortunately, this does not work properly when the width of the integer type is small enough to be subject to integral promotion. For example, if x and y were both unsigned short, when compiled, the above code would end up being equivalent to (unsigned int)x + y < x, making this overflow check ineffective.

Here’s a QL query that matches this code pattern:

import cpp
 
from AddExpr a, Variable v, RelationalOperation r
where a.getAnOperand() = v.getAnAccess()
  and r.getAnOperand() = v.getAnAccess()
  and r.getAnOperand() = a
  and forall(Expr op | op = a.getAnOperand() | op.getType().getSize() < 4)
  and not a.getExplicitlyConverted().getType().getSize() < 4
select r, "Useless overflow check due to integral promotion"

In the from clause, we define the variables, and their types, to be used in the rest of the query. AddExpr, Variable, and RelationalOperation are QL classes representing various sets of entities in the snapshot database, e.g. RelationalOperation covers every expression with a relational operation (less than, greater than, etc.)

The where clause is the meat of the query, using logical connectives and quantifiers to define how to relate the variables. Breaking it down, this means that the addition expression and the relational operation need the same variable as one of their operands (x in the example code above):

a.getAnOperand() = v.getAnAccess() and r.getAnOperand() = v.getAnAccess()

The other operand of the relational operation must be the addition:

r.getAnOperand() = a

Both operands of the addition must have a width less than 32 bits (4 bytes):

forall(Expr op | op = a.getAnOperand() | op.getType().getSize() < 4)

But if there is an explicit cast on the addition expression, we’re not interested if it’s less than 32 bits:

not a.getExplicitlyConverted().getType().getSize() < 4

(This is so a check like (unsigned short)(x + y) < x doesn’t get flagged by the query.)

Finally, the select clause defines the result set.

This vulnerability was originally reported in Chakra (the JavaScript engine of Edge), where the consequence of that particular ineffective overflow check was memory corruption. The query matched the original vulnerability but no additional variants in Chakra. However, we discovered several when applying this exact query to other Windows components.

Unsafe use of SafeInt

An alternative to rolling your own integer overflow checks is to use a library with such checks built in. SafeInt is a C++ template class that overrides arithmetic operators to throw an exception where overflow is detected.

Here’s an example of how to use it correctly:

int x, y, z;
 
// ...
 
z = SafeInt<int>(x) + y;

But here is an example of how it can be unintentionally misused – the expression passed to the constructor may already have overflowed:

int x, y, z;
 
// ...
 
z = SafeInt<int>(x + y);

How to write a query to detect this? In the previous example, our query only used built-in QL classes. For this one, let’s start by defining our own. For this, we choose one or more QL classes to subclass from (with extends), and define a characteristic predicate which specifies those entities in the snapshot database that are matched by the class:

class SafeInt extends Type {
  SafeInt() {
    this.getUnspecifiedType().getName().matches("SafeInt<%")
  }
}

The QL class Type represents the set of all types in the snapshot database. For the QL class SafeInt, we subset this to just types with a name that begins with “SafeInt<”, thus indicating that they are instantiations of the SafeInt template class. The getUnspecifiedType() predicate is used to disregard typedefs and type specifiers such as const.

Next, we define the subset of expressions that could potentially overflow. Most arithmetic operations can overflow, but not all; this uses QL’s instanceof operator to define which ones. We use a recursive definition because we need expressions such as (x+1)/y to be included, even though x/y is not.

class PotentialOverflow extends Expr {
  PotentialOverflow() {
    (this instanceof BinaryArithmeticOperation    // match   x+y x-y x*y
       and not this instanceof DivExpr            // but not x/y
       and not this instanceof RemExpr)           //      or x%y
 
    or (this instanceof UnaryArithmeticOperation  // match   x++ x-- ++x --x -x
          and not this instanceof UnaryPlusExpr)  // but not +x
 
    // recursive definitions to capture potential overflow in
    // operands of the operations excluded above

    or this.(BinaryArithmeticOperation).getAnOperand() instanceof PotentialOverflow
    or this.(UnaryPlusExpr).getOperand() instanceof PotentialOverflow
  }
}

Finally, we relate these two classes in a query:

from PotentialOverflow po, SafeInt si
where po.getParent().(Call).getTarget().(Constructor).getDeclaringType() = si
select
    po,
    po + " may overflow before being converted to " + si

.(Call) and .(Constructor) are examples of an inline cast, which, similar to instanceof, is another way of restricting which QL classes match. In this case we are saying that, given an expression that may overflow, we’re only interested if its parent expression is some sort of call. Furthermore, we only want to know if the target of that call is a constructor, and if it’s a constructor for some SafeInt.

Like the previous example, this was a query that provided a number of actionable results across multiple codebases.

JavaScript re-entrancy to use-after-free

This next example was a vulnerability in Edge caused by re-entrancy into JavaScript code.

Edge defines many functions that can be called from JavaScript. This model function illustrates the essence of the vulnerability:

HRESULT SomeClass::vulnerableFunction(Var* args, UINT argCount, Var* retVal)
{
    // get first argument -
    // from Chakra, acquire pointer to array
    BYTE* pBuffer;
    UINT bufferSize;
    hr = Jscript::GetTypedArrayBuffer(args[1], &pBuffer, &bufferSize);
 
    // get second argument –
    // from Chakra, obtain an integer value
    int someValue;
    hr = Jscript::VarToInt(args[2], &someValue);
 
    // perform some operation on the array acquired previously
    doSomething(pBuffer, bufferSize);
 

The problem was that when Edge calls back into Chakra, e.g. during VarToInt, arbitrary JavaScript code may be executed. The above function could be exploited by passing it a JavaScript object that overrides valueOf to free the buffer, so when VarToInt returns, pBuffer is a dangling pointer:

var buf = new ArrayBuffer(length);
var arr = new Uint8Array(buf);
 
var param = {}
param.valueOf = function() {
    /* free `buf`
       (code to actually do this would be defined elsewhere) */

    neuter(buf);  // neuter `buf` by e.g. posting it to a web worker
    gc();         // trigger garbage collection
    return 0;
};
 
vulnerableFunction(arr, param);

The specific pattern we’re looking for with QL is therefore: acquisition of a pointer from GetTypedArrayBuffer, followed by a call to some Chakra function that may execute JavaScript, followed by some use of the pointer.

For the array buffer pointer, we match on the calls to GetTypedArrayBuffer, where the second argument (getArgument of Call is zero-indexed) is an address-of expression (&), and take its operand:

class TypedArrayBufferPointer extends Expr {
    TypedArrayBufferPointer() {
        exists(Call c | c.getTarget().getName() = "GetTypedArrayBuffer" and
               this = c.getArgument(1).(AddressOfExpr).getOperand())
    }
}

The exists logical quantifier is used here to introduce a new variable (c) into the scope.

There are several Chakra API functions that could be used for JavaScript re-entrancy. Rather than defining them by name, we can identify the internal Chakra function that performs this task, and use QL to figure this out from the call graph:

// examine call graph to match any function that may eventually call MethodCallToPrimitive
predicate mayExecJsFunction(Function f) {
    exists(Function g | f.calls+(g) and g.hasName("MethodCallToPrimitive")
    }
 
// this defines a call to any of the above functions
class MayExecJsCall extends FunctionCall {
    MayExecJsCall() {
        mayExecJsFunction(this)
    }
}

The “+” suffix of the calls predicate specifies a transitive closure – it applies the predicate to itself until there is a match. This permits a concisely defined exploration of the function call graph.

Finally, this query brings these QL class definitions together, relating them by control flow:

from TypedArrayBufferPointer def, MayExecJsCall call, VariableAccess use, Variable v
where v.getAnAccess() = def
  and v.getAnAccess() = use
  and def.getASuccessor+() = call
  and call.getASuccessor+() = use
select use,
    "Call to " + call + " between definition " + def + " and use " + use

The predicate getASuccessor() specifies the next statement or expression in the control flow. Therefore, using e.g. call.getASuccessor+() = use will follow the control flow graph from call until there is a match to use. The diagram below illustrates this:

This query uncovered four variants in addition to the originally reported vulnerability, all of them assessed as critical severity.

That’s all for now. The next instalment will cover using QL for data flow analysis and taint tracking, with examples from our security review of an Azure firmware component.

 

Steven Hunter, MSRC Vulnerabilities & Mitigations team


Comments (0)

Skip to main content