Software Engineering for Smart Data Analytics & Smart Data Analytics for Software Engineering
Here, we introduce our combination of logic-based program analysis with concrete syntax patterns. The relevant concepts are illustrated using Java as source and target language.
A concrete syntax pattern (CSP) is a snippet of concrete base language code, e.g.,
Java, that may contain meta-variables. A meta-variable (MV) is a placeholder for
any legal expression of the base language, i.e, all generating non terminal symbols of
the language’s BNF representation. Thus, meta-variables are simply variables that
can range over syntactic elements of the analyzed language. Normal meta-variables
represent a single base language element. In addition, list meta-variables can match
an arbitrary sequence of elements of the same sort, e.g., arbitrary many call arguments
or statements within a block. Syntactically, meta-variables are denoted by
identifiers starting with a question mark, e.g. ?call
. List meta-variables start with
two question marks, e.g. ??args
.
<code Java 1|Listing 1: Simple Java class> class C{
void x ( ){ A.a(42, 43); this.y(); y().y(); } C y(){ return this ; }
} </Code>
Consider the example shown in Listing 1 and assume we want to select all method invocations. This can be achieved via the following concrete syntax pattern capturing the structure of method calls in Java:
?expression.?call(??args)
If evaluated on the program shown in Figure 4 it matches the calls in Lines 3 to 5.
For each match of the pattern, the
meta-variable ?call
is bound to the corresponding method
name (a
and y
) whereas ?expression
is bound to the expression
denoting the object on which the method is invoked:
A
, this
, y()
and this. The latter is the implicit receiver of the
first call in Line 5. For each match, the list meta-variable
??args
is bound to the call arguments. In our example only
the call in Line 3 contains arguments. In all other cases
??args
is bound to an empty argument list. Each match
of the CSP yields a tuple of values (a substitution) for
the MV tuple (?expression
, ?call
, ??args
). In our example
the substitutions are (A,a,[42,43]
), (this,y,[]
), (this,y,[]
) and
(this.y(),y,[]
), where []
denotes the empty list.
Concrete syntax patterns are matched at the AST level. Except for the use of meta-variables, a pattern must correspond to a valid AST of the base language. The pattern’s AST is matched to the base program’s AST. Meta-variables of the CSP can match entire subtrees within the base program.
Figure 1 shows the matching of our example pattern at the AST level. First, the pattern is transformed into its corresponding AST. This pattern matches four subtrees within the AST of C. The blue boxes indicate the matches. Note that a pattern can match recursively within a subtree. This is the case for the expression y().y() in Line 4 of Listing 1. The two nested blue boxes (the rightmost boxes) in Figure 1 represent the recursive match. Since CSPs match at the AST level, matching is not restricted to lexical structures. For instance, the pattern
if(?expr){ ??statements }
matches the statement
if(a < b) a=b;
although its block of statements is not enclosed in curly braces. Patterns must be complete and valid expressions of the base language. However, a pattern does not need to specify all elements of the matched element. Consider, for instance, the pattern
class ?classname{??class members}
It contains neither a modifier, nor a declaration of interface implementation, nor a superclass declaration. Still, it matches both of the following:
public class C implements D {...} abstract public class C extends E {...}
In order to use CSPs in logic programs, we introduce a special pattern predicate, is. It is often necessary to be able to refer to the entire program element matched by a CSP. For this purpose, the left-hand-side of is denotes the program element matched by the CSP on the right-hand-side.
meta-variable is [[concrete syntax pattern]]
Technically, the predicate unifies its first argument with the unique identifier of a program element matched by the syntax pattern in the second argument. If the pattern matches different elements, each one’s identity is unified with the metavariable upon backtracking. The pattern predicate combines the expressiveness of logic programming with intuitive pattern description.
It is often not sufficient to consider only a syntactic element itself but also its static context. For example, the declaring type contains important information about a method or a field declaration. Also the statically resolved binding between a method call and its called method (or a variable access and the declared variable) is necessary for many analyzes. This information is available via context attributes, which can be attached to meta-variables by two double colons. The most important attributes are described in the following: ?mv::decl The statically resolved corresponding declaration of ?mv. Calls reference their method, variable accesses a field, and local variable or parameter declarations and type expressions reference their class.
Meta-variable attributes | |
---|---|
?mv::type | The statically resolved Java type of an expression bound to ?mv . |
?mv::encl | The enclosing method or class of a statement or expression. |
?mv::parent | The parent element of ?mv . |
Note that meta-variable attributes can be cascaded. For example, ?mv::encl::type refers to the type of the enclosing method or class of the element bound to ?mv.