The CTC is based on a representation of software elements as logic facts. As an example, the figure below illustrates one of many possible fact representations for a Java program.
In the sample representation (adapted from JTransformer) each abstract syntax tree (AST) node of the program is
represented as a logic fact. Nodes are linked by their identities. The first argument of each fact is the identity of the node that it represents. Identitites in other
argument positions represent references to other nodes. Thus each fact can be seen as an adjacency list representation of a node in a graph. For instance, in the figure,
every fact representing a node refers to its parent via its second argument.
Logic programming has many advantages. It has a declarative semantics that makes it amenable to analyses, transformations and optimizations that are impossible in imperative languages, such as Java, because of side-effects and aliasing.
Declarativeness makes programs easy to write, understand and maintain. In addition, the multi-directionality of relational programming makes predicate definitions highly reusable.
The main arguments against the “classic” incarnations of logic programming in languages of the Prolog family are:
Inability to to express updates declaratively. Prolog has no declarative semantics for dealing with updates of the database, defeating the main motivation for logic: declarativeness.
(In)Efficiency. The potentially extreme inefficiency of carelessly written Prolog programs makes them either inacceptable in practice or requires deep knowledge of the operational semantics and many programming idioms that defeats the potential advantages of declarative programming in terms of ease and simplicity.
Our aim was the development of a model transformation language that provides the full advantages of unpolluted declarative programming, being easy to understand for programmers who otherwise use imperative languages and have no PhD in logic.
Therefore, we have addressed both of the above problems when designing the CT language:
Declarative updates. The semantics of
CTs overcomes the problems of Prolog interpreters, enabling a well-defined declarative semantics and an operational semantics that matches it precisely. So again, programmers can write
CTs without having to care about the details of the operational implementation of the language.
Compilation. A compiler tuned specifically for efficiency on large factbases, preprocesses the
CT programs, performing automatically most of the tweaking that would otherwise burden developers, challenging their logic programming skills. Thus even novice programmers can write programs that will run efficiently. There is no need anymore to trade declarativeness for speed.
NOTE
The publicly available versions of the CTC do not include the compiler. How much of the compiler will be released under which terms and conditions remains to be determined.
Why not terms?
One could also represent programs as nested terms, e.g.
package('default', [
class('C',
[ modifier('public')
],
[ method('m',
[ param('int‚ 'i')
],
'void',
[ call('null', 'm', [ arg( ident('i') )
]
)
]
)
]
)
]
)
There are two good reasons to use facts instead of terms:
The need to encode arbitrary graphs, not just trees. Terms cannot represent backreferences from children to parents or references to siblings in a tree (at least not without duplicating the entire term). The A-Term library of ASF/SDF provides an efficient solutioni to this problem but using it would require implementing our own Prolog system. Currently, A-Terms are not used in any existing Prolog implementation.
The term representation in Prolog is utterly inefficient on very large structures. It is space ineffcient since passing of large structures on the stack quickly leads to stack overflows. It is time inefficient since access to subterms cannot take advantage of the roughly constant time access available for facts via first argument indexing in any widely-used Prolog implementation. If deployed for programs of 10 thousand or even millons of lines of code, the fact representation is orders of magnitude faster regarding access (search) as well as modification (transformation). If you are interested in concrete run-time comparisons you are welcome to
contact us.
Relation to databases
The fact representation has a simple mapping to relational databases:
| Logic Programm | Relational Database |
| Predicate | Relation |
| Argument | Attribute |
| Identity | Key |
| Fact | Tuple |
The analogy can be extended to rules (see section on conditions). However, there is no counterpart to
recursive rules and to strucured terms (that is, terms such as 'array(type(int), dimension([5,5]))'
as opposed to simple numbers or strings) in the relational data model:
| Logic Programm | Relational Database |
| Non-recursive rule | View |
| Recursive rule | — |
| Structured term | — |
Processing of recursive rules on flat data is the domain of deductive databases. Non-flat data is the domain of other non-standard databases (NF2 data model, etc.). Both are not widely used.
The straightforward mapping to relational databases ensures the scalability of the approach iff the size of models should ever grow beyond main
memory capacity. Still, for the time being, we
found no need for this. Even *complete* models of programs the size of the Eclipse Platform fit well in the main
memory of a standard PC. The complete representation of the Java code of the Eclipse Platform, roughly 1 milion lines of code in around 11500 classes, produces a CTC process of roughly 265 MB.
Relation to Model Driven Software Engineering (MDSE)
The logic-based representation of software artefacts,
and their analysis and transformation via
CTs are an approach to purely declarative MDSE. In contrast to mainstream model transformation languges, such as QVT and
ATL,
CTs are based
entirely on declarative concepts. This enables analyses and automated optimizations that cannot
be performed for imperative languages because of their side-effects, aliasing, etc.
CTs are also unique in their support
for compositional model analyses and transformations. Most of the
CT sequence operators currently have no counterpart in any other approach.