Program Analysis with Predicates

Here, we introduce our combination of logic-based program analysis with concrete syntax patterns. The relevant concepts are illustrated using Java as source and target language.

Concrete Syntax Patterns and Meta-Variables

A concrete syntax pattern (CSP) is a snippet of concrete base language code, e.g., Java, that may contain meta-variables. A meta-variable (MV) is a placeholder for any legal expression of the base language, i.e, all generating non terminal symbols of the language’s BNF representation. Thus, meta-variables are simply variables that can range over syntactic elements of the analyzed language. Normal meta-variables represent a single base language element. In addition, list meta-variables can match an arbitrary sequence of elements of the same sort, e.g., arbitrary many call arguments or statements within a block. Syntactically, meta-variables are denoted by identifiers starting with a question mark, e.g. ?call. List meta-variables start with two question marks, e.g. ??args.

 1: class C{
 2:   void x ( ){
 3:     A.a(42, 43);
 4:     this.y();
 5:     y().y();
 6:   }
 7:   C y(){
 8:    return this ;
 9:   }
10: }

Listing 1: Simple Java class

Consider the example shown in Listing 1 and assume we want to select all method invocations. This can be achieved via the following concrete syntax pattern capturing the structure of method calls in Java:

?expression.?call(??args)

If evaluated on the program shown in Figure 4 it matches the calls in Lines 3 to 5. For each match of the pattern, the meta-variable ?call is bound to the corresponding method name (a and y) whereas ?expression is bound to the expression denoting the object on which the method is invoked: A, this, y() and this. The latter is the implicit receiver of the first call in Line 5. For each match, the list meta-variable ??args is bound to the call arguments. In our example only the call in Line 3 contains arguments. In all other cases ??args is bound to an empty argument list. Each match of the CSP yields a tuple of values (a substitution) for the MV tuple (?expression, ?call, ??args). In our example the substitutions are (A,a,[42,43]), (this,y,[]), (this,y,[]) and (this.y(),y,[]), where [] denotes the empty list.

Structural pattern matching

Concrete syntax patterns are matched at the AST level. Except for the use of meta-variables, a pattern must correspond to a valid AST of the base language. The pattern’s AST is matched to the base program’s AST. Meta-variables of the CSP can match entire subtrees within the base program.

Figure 1: Pattern matching at AST level

Figure 1 shows the matching of our example pattern at the AST level. First, the pattern is transformed into its corresponding AST. This pattern matches four subtrees within the AST of C. The blue boxes indicate the matches. Note that a pattern can match recursively within a subtree. This is the case for the expression y().y() in Line 4 of Listing 1. The two nested blue boxes (the rightmost boxes) in Figure 1 represent the recursive match. Since CSPs match at the AST level, matching is not restricted to lexical structures. For instance, the pattern

if(?expr){ ??statements }

matches the statement

if(a < b) a=b;

although its block of statements is not enclosed in curly braces. Patterns must be complete and valid expressions of the base language. However, a pattern does not need to specify all elements of the matched element. Consider, for instance, the pattern

class ?classname{??class members}

It contains neither a modifier, nor a declaration of interface implementation, nor a superclass declaration. Still, it matches both of the following:

public class C implements D {...}
abstract public class C extends E {...}

Pattern predicate

In order to use CSPs in logic programs, we introduce a special pattern predicate, is. It is often necessary to be able to refer to the entire program element matched by a CSP. For this purpose, the left-hand-side of is denotes the program element matched by the CSP on the right-hand-side.

meta-variable is [[concrete syntax pattern]]

Technically, the predicate unifies its first argument with the unique identifier of a program element matched by the syntax pattern in the second argument. If the pattern matches different elements, each one’s identity is unified with the metavariable upon backtracking. The pattern predicate combines the expressiveness of logic programming with intuitive pattern description.

Meta-variable attributes

It is often not sufficient to consider only a syntactic element itself but also its static context. For example, the declaring type contains important information about a method or a field declaration. Also the statically resolved binding between a method call and its called method (or a variable access and the declared variable) is necessary for many analyzes. This information is available via context attributes, which can be attached to meta-variables by two double colons. The most important attributes are described in the following: ?mv::decl The statically resolved corresponding declaration of ?mv. Calls reference their method, variable accesses a field, and local variable or parameter declarations and type expressions reference their class.

Meta-variable attributes
?mv::type The statically resolved Java type of an expression bound to ?mv.
?mv::encl The enclosing method or class of a statement or expression.
?mv::parent The parent element of ?mv.

Note that meta-variable attributes can be cascaded. For example, ?mv::encl::type refers to the type of the enclosing method or class of the element bound to ?mv.

Last modified: 2017/08/30 13:29
*