© Jørgen Steensgaard-Madsen, Copenhagen, 2006
Introduction                                                  
 
   

Compiler construction tools are well known and several exist. Typically these tools strive for generality i.e. the aim is to allow users to define languages and generate compilers for them with as few restrictions as possible. Less restrictions tends to require more skills. Our goal is to ease construction by imposing some restrictions that, hopefully, will not be considered too severe.
The focus on construction of interpreters for a family of programming languages, rather than languages in general, is a characteristic of dulce. Some design decisions have been made and impose restrictions: the languages will be strictly typed, will use type inference, will use static scope rules, and will use one common syntax.
The common syntax involves that well-known constructs, e.g. if- and while-constructs, are considered applications of corresponding operations, or subroutines if you like. This point of view applies even for declaration of variables and something like object instantiations (when appropriate).
Consequently a particular language is essentially determined by a set of (predefined) operators and operations. The term Howard languages will be used for these languages.
A computation primitives of these languages is either a function (in the strict mathematical sense as a subset of the product of two sets) or an effector, i.e. a higher-order operation that map computations into a set (which precludes effectors).1 Some primitives are characteristic of computation by computers: memory, late evaluation of argument expressions, parallellism, network connections, and more.
An effector is deterministic if it maps a particular computation into the same value always, and non-deterministic otherwise. The usual uniqueness requirement of functions as subsets of pairs is a property similar to determinacy of an effector. Non-deterministic effectors may represent something like random number generators that depend on cosmic radiation, say, and parallellism. Since equivalence of computations is undecidable, the notion of functions does not suffice. Deterministic effectors can describe, for instance, finite state machines.
The Howard Languages uses a common notation to express combination of effectors. This notation reflects a subjective perception of what is appropriate for these primitives, but it also deviates from usual mathematical notation. Despite the notation, the notion of effectors has turned out to be very useful in practise.
Each effector is described in terms of types, be it a function, a structured statement, or a package that can best be compared to a many sorted algebra. Types are expressions in type operators with fixed (non-negative) arities. Reverse polish notation is used for types, e.g. string string List Array Pair. A type can be conceived as a set of values, or as a class of computations (to compute a value of the type considered as a set of values).
A signature describes syntactic attributes of an effector, but is not considered to be a type. A Howard language is essentially a collection of effectors (incrementally) added to an initial context of basic types (e.g. numbers and strings) and operator symbol attributes (including precedence and association, but not semantics).
This document presents interpreter construction as collecting loosely coupled language contributions. Construction with dulce provides implementation of some language properties, like type checking, and supports incremental construction of interpreters. An overall goal has been to make it easy to build interpreters for domain specific languages, in particular for experts in fields other than programming language design and implementation.
Simplicity comes at the price of adhering to some common design decisions. The syntax is restricted compared to the freedom allowed by BNF and similar notations for context free language descriptions. Hopefully, this approach will be seen as a constructive guide to good language design.
Some sections address users of interpreters, others address contributors. Examples showing use of actual language constructs should be useful to make prospective contributors familiar with principles of future constructs. Migration from user to contributor should be smooth.
Let's have a brief look at the common syntax: an interpreter handles expressions which essentially are sequences of operands and operators alternating. Operators may have different meanings for different types of operands. An operand may be a constant, a parenthetical expression or a term. Terms start with an operation identifier that is applied to a number of clauses which may be parenthetical expressions or semicolon separated sequences of expressions in braces.
The common syntax, especially the notion of a term, is general enough to encompass many statement forms known from other languages, as well as modules, somewhat related to higher-order classes (or templates). It covers, for instance, a number of patterns of expression:

 program { 
   multiprogramming x;               # instantiation of an object  
   ... x.newthread{ ... }; ...       # a call of a method
 };
 
 if (...) then { ... } else { ... }; # well known control constructs
 
 Members                             # definition of methods
  start: { ... }                     # (run-time binding, though)
  step:  { ... }
  stop:  { ... }



Essentially an application can be expressed in the usual way
f(e0,e1,...,ek)
In addition to this, one or more arguments may be expressed by a sequence of computations, {u0;u1;...;un}. Its type and value is given by un and it may replace ei in writing as follows
f(e0,e1,...,ei-1){u0;u1;...;un}(ei+1,...,ek)
This covers also the case n=0 in which case the expression again may be enclosed in parentheses, and ultimately admits the extreme cases:
f(e0)(e1)...(ek)
f{e0}{e1}...{ek}
Every application requires a fixed number of arguments, called the arity of the applied operation. However, inside a sequence of expressions a special rule applies
The tail coup: If an operation of arity m is applied to m-1 arguments and followed by a semicolon, then the m-th argument is the rest of the sequence, i.e.
{u0; u1; ... g(v0,v1,...,vm-2); ui; ... un}
º {u0; u1; ... g(v0,v1,...,vm-2){ui; ... un}}
A name may be introduced in front of every left brace or parenthesis, possibly followed by a colon. The term label will be used for such names, but labels should not be confused with destinations of gotos which are not supported. A label may be used to enhance readability or as a qualifier for certain names. Where the tail coup is used, a label may also be introduced before the last argument, i.e.  before the semicolon.
The expression syntax combined with a principle of implicit bindings, which will be described later, is chosen because it agrees with the pattern of crossing a border between two separate contexts, as illustrated by the time-line diagram in Figure 2.1. The principle of implicit bindings is similar to the rule of (member-) name introductions in object oriented languages.
Figure 1: Time-line diagram for context switches

figures/timeline.png
The distance between the (thick) borderline and the (thin) vertical line segments, representing program positions, is less for higher levels of nesting. More facilities to access the other side of the border may become available as the distance diminishes, and facilities accumulate with the level of nesting. This is true on both sides of a border. In principle there is no limit to the level of nesting.
The two contexts may exist in one program, and if so they have a common global context. However, the semantics context may also be the one that contains external definitions (written in a quite different language, e.g. C).

Incremental construction of interpreters is considered the norm. In broad terms it is possible to define a language contribution consisting of operations implemented as a library of routines. It can subsequently be linked together with other libraries into an interpreter. Thus we find it more appropriate to use a term like language contributor than - implementor.
A user of dulce is presumably a non-language specialist who wants to contribute with something that matches his or her professional expertise. In this sense the intended class of languages to be supported can be characterised as domain specific languages. Like any tool dulce does require some knowledge to be acquired before it can be used with ease. For a class of domain specific languages the support for incremental construction is important: experts should have an interest in reuse so that they may easily contribute to more than one language.
dulce itself can be considered a language specialist's contribution: Interpreters will support language aspects that experts in other fields may consider less important, e.g. the presence of a type system and the checks that this allows.
Of course opinions on language design issues vary, and other language experts may disagree strongly on choices made for dulce. For instance: the actual choices are in most respects very different from the ones represented by Tcl, which is neither statically typed nor conforming to static scope rules, but certainly a useful language.
The focus is on interpreters, but a given program can be translated into C, or more specifically into the extended language accepted by the GNU C compiler. Of course dulce is related to compiler construction tools, but it should not be considered a competitor of these since dulce adds some semantics to any language by default. Type checking has been mentioned, but another deserves mention also. If an interpreter reads an expression E, which yields a value of some type, T say, then it will actually interpret the expression stdout <<  E. This makes sense only if a contributed meaning of stdout and the operator symbol << can be found, and if it leads to the intended printing out of the value of E. This puts a small burden on contributors, but relieves dulce from knowing the type T and how to print values of type T. A similar approach is used so that dulce can interpret special expressions intended for lists and tuples.
The language of semantics, which is used to build libraries, is GNU C. Contributors are helped in various ways by dulce and appropriate makefiles.Presumably, a formal semantics for a language implemented by dulce may in the future be constructed with similar help.
The actual semantic routines are closely related to a continuation based denotational semantics. It is related to a notion of an explicit continuation which is important for the design of some operations. A structural outline of a an operation's semantics can be generated by dulce from a description of its syntax. Explicit continuation provides a link to object-oriented languages, in particular the binding of member names established by the instantiation mechanism.
Ordinary function definitions bind (parameter-) names explicitly whereas object instantiations bind (member-) names implicitly. The former kind allows programmers to choose, the latter implies a fixed choice, expressed by a class definition. The implicit binding of (fixed) names in the clauses of an operation is an essential design mechanism for users of dulce.
Programmers are, in contrast to mathematicians, used to context sensitive meanings of variable names: a C-expression like x=x+1 can turn a (traditional) mathematician's hair gray overnight. Programmers know to distinguish between different meanings on the left and right sides of an =-operator. In our approach such an expression is a shorthand for x.l=x.r+1, i.e. we use members to refer to the different meanings. We do support this shorthand, but it should be worth noting that we consider x a label, and labels do not in general have meanings. Fortunately, mostly designers have to worry about this, not casual users.
The tools flex and bison are used to accept and analyse the common syntax. Semantics of a special operation interfaces to these as it uses them to read an expression. It can then also have it interpreted in the context where the operation is applied. This embedded interpretation makes it possible to support embedded languages as extensions to the one being interpreted. The operation is related to eval-operation in some Lisp-like languages. Unfortunately the operation is not supported in the translated version of a program.
An example language has been used for illustrations throughout. It is called demo and readers are asked to remember that the language just illustrates what can be built with dulce and is not by itself of primary interest.
The following section presents a fairly involved example. It should be read as an appetiser before the tutorial for users, but only a limited understanding of its details can be expected in a first reading.

Contents

Demo language
·Implementation tool
Copyrights


·Introduction
Principles
Interpreter construction


Footnotes:

1An effector is similar to the mathematical notion of a functional, which however requires its domain to be a vector space. Whether computations can be organised as a vector space is doubtful, however. A closely related mathematical concept is module, which has strong connotations in informatics, however.

File translated from TEX by TTH, version 3.33.
On 18 Oct 2006, 16:47.
SourceForge.net Logo