© Jrgen Steensgaard-Madsen, Copenhagen, 2006
Common reference                                                  
 
   

This section describes common properties of Howard languages, ranging from tokens and syntax to program call arguments and semantics of special operations.

Characters and tokens

Characters are classified as follows
digits
0-9
letters
a-z_A-Z
operator characters
-+*?^/<>=@&!|:~%$
punctuation characters
#:(){}[]"'`\.;,
Use of characters not mentioned here will cause problems. Sequences of characters form tokens, most of which are described in the scanner by regular expressions as in the (flex-) scanner:
reserved names
OF DEF REC
integers
[0-9]+
reals
[0-9]+\.[0-9]*(e[-+]?[0-9]{1,3})? |
[0-9]*\.[0-9]+(e[-+]?[0-9]{1,3})? |
[0-9]+e[-+]?[0-9]{1,3}
chars
'[^\']' | '\[0-9]{3}' | '\\.'
The notations 'a', '\a' and '\097' are identical (provided the underlying encoding of characters maps a to 97).
strings
"[^"]*"
The escape mechanisms described for chars may be used also for single characters in a string.
operators
(operator characters)+ | `[^\`\'\n\t ]*'
identifiers
[a-z_A-Z][a-z_A-Z0-9]*

Expression syntax

Howard languages are essentially characterised by the collection of operator symbols and operation names that are predefined. Each of these determine the number of operands or clauses are required in applications, but apart from these details the expression syntax of the languages can be described as follows:
An application of an operation can be described briefly as
application = operation {[label]sublist}
sublist = {(expression[,expression]) | composite}
composite = {{[expression;] [;]} [expression]}
operation = [identifier.]identifier
label = identifier[:]
expression = [operand] {operator operand}                        (see below)
operand = application | constant | (expression) | shorthand
shorthand =
(expression{,expression}) |
[expression{,expression}] |
composite
where { ... } prescribes possibly empty iteration, and [ ... ] an optional part. Every expression in a parenthetical sublist is a single clause, as is every bracketed sequence of expressions.
Identification of an operation requires in general two identifiers, the first of which identifies a level of nesting and the second a member name introduced at that level. If only a single identifier is given for an operation, it may identify a level and the member name __ is subsumed; if not a member name is sought at the current and outer levels, and the first one found will be taken as the label part.
The shorthand forms are interpreted as follows
(e1,e2,...,en)    (e1 `,' e2 `,' ... `,' en)
[e1,e2,...,en]    (e1 :: e2 :: ... :: en :: nil)
{ ... }    program{ ... }
and someone is expected to contribute semantics for the operators and the operations nil and program.
In an expression the special operator  ? may be left out, as well as some operands depending on operator properties. A full description of expressions can be found in the parser's source text, but it is not illuminating. The reason is the dependency on operator properties and that humans naturally will use parentheses in non-obvious situations (e.g. 3 ! - ! 0 might be either (3!)-(!0) or 3!(-(!0)) and humans will avoid it anyway).
The syntax description above is ambiguous, but ambiguities are resolved in a natural way, hopefully.

Signature syntax

A signature provides syntactic information about operators and operations.
signature = typed_signature | call_signature
typed_signature = call_signature : type_expr | operator_sig
call_signature =
identifier [type_operators]
[(typed_signature {,typed_signature})]
{[signature]}
operator_sig = operator [type_operators] [{pattern{,pattern}}]
(type_expr,type_expr) :type_expr
type_expr = {type_expr} type_identifier
type_operators = OF [arity] type_identifier{,[arity] type_identifier}
pattern = type_expr operator type_expr [:type_expr]
A signature with no explicitly given type has type void. The explicitly typed signature sequence in a call_signature is intended to indicate call-by-value, but this is not strongly enforced. An internal definition requires explicit use of a predefined operation that implements call-by-value. The default is call-by-name.
New names, type_identifiers, are introduced with type_operators. An arity given explicitly with a type_identifier denotes the number of arguments required to precede it in a type_expr. A missing arity means that no argument is required.
The scope of type_identifiers is the signature in which the type_operators appears. Visibility of type_identifiers follows the rules of lexical scope. The signatures of operations and operators in an expression depend on the signature hierarchy of which they may be parts, i.e. types are bound as in the hierarchy. (In contrast, the composition of an expression determines the meaning of a free type_identifier - when used in a REC or DEF)
The arity of an operation is the number of clauses required in an application. For each clause there must be a typed_signature or signature in the operation's call_signature (in a position that matches the clause's).
A call_signature that corresponds to a clause describes operations and operators with names that are introduced for use in the clause. The clause is their scope.
An operator_sig that corresponds to a clause describes the types of the names left and right that are introduced for use in the clause and bound to the (computation of) operands. Operands should be evaluated once only, cf. call-by-value.
A pattern provides type information that allows dulce to build an expression to compute left operator right, i.e. to generate an additional operand in an operator application. This is needed, for instance, to implement << for lists with elements of unspecified type (possibly also lists).

Syntax file

The syntax of a Howard language is given as a file which mostly contain signatures (in brackets). This subsection describes the remaining parts.
attributes = { [arity] type_identifier
  | operator affinity coercion_rule prefix
  | [signature] } .
A type_identifier will denote a global type operator. An arity given explicitly with a type_identifier denotes the number of arguments required to precede it in a type_expr. A missing arity means that no argument is required.
An affinity is a number which indicates the binding strength, precedence, of an operator and its associativity: even if it associates to the right, odd if to the left.
A coercion_rule is one of left, right, both, or none and tells whether to look for a coercion operator for operands. Such rules are intended to distinguish the two meanings of, for instance, x in x := x+1. Appropriate attributes for the two operators might be
:= 2 right asign
+ 5 both add
The prefix is part of names generated for operators in outlines of external definitions of operators, for instance add_int_int and add_int_real may be names of two definitions of semantics for an overloaded + operator.
Global type operators and attributes of operators will be shared by all contributions to a branch of Howard languages. The attributes will typically be given in the top of a development hierarchy and be included in a complete syntax file as part of an automated build process for a language.
The full-stop character indicates the end of attributes. An interpreter reads first the syntax file and the switches to another scanner to read an expression.

Syntax of DEF and REC

The special construct DEF splits a computation in two, such that one can be referred to by name as an operation of that name. The split creates an interface that can be described with a signature. DEF itself is not an operation and cannot be described as one. Its semantics depends on an operation Def and and dulce deals with this in a special way. The name actually starts with a blank, so contributors can define an external meaning of Def. The structure of a DEF is
DEF signature sublist
where the sublist must have two elements, the first of which defines the meaning of an operation described by the signature. Both clauses may use the defined operation, i.e. recursive definitions are allowed. From a systematic point of view, the syntax rule for operation has been extended:
operation = [identifier.]identifier | DEF signature
A REC is used to introduce a Tuple type and and a number of operations to build and destruct values of that kind. The pattern for such a construct is simple
REC identifier(identifier:type {,identifier:type})
It may be used only as one expression in a composite which then needs a revised description as follows
composite = {{[element;] [;]} [element]}
element = expression
|  REC identifier(identifier:type {,identifier:type})
A REC introduces a type name, RECORD, for use in the rest of the composite, and functions described by p(X:RECORD):t for each field typing p:t that appears in the REC element. A function __(p1:t1,p2:t2,...,pn:tn):RECORD is introduced also. The identifier that follows REC will be a label for the rest of the sequence, so it can be used as a name for the constructor function.
The syntax of REC does not fit nicely into the concepts of the chosen grammar, but the semantics can be described almost as a macro, cf. Figure 2.5.
Definitions of the operators (<< and ??) illustrate several aspects. Both are defined in terms of the corresponding operators for the underlying representation of RECORD, which is a tuple of course. Writing of a tuple requires a knowledge of writing its parts, and the << operator for tuples has been defined with patterns so that this information is generated automatically. Note that the written version of a RECORD uses the name of the constructor function, h. It reflects an advice to write values as they are written in expressions.
Comparison operators, e.g. >= and ??, may be defined in terms of ?? for parts of a structure, so also these cases depend on use of pattern in a signature.
Figure 5: Expansion that matches the handling of REC


figures/rec.png

Operations with tool defined semantics

A few operations have their semantics defined by dulce, despite the fact that corresponding signatures exist. This reflects the fact that dulce (presently) is not a collection of separate programs. Put otherwise: this may change if the structure of dulceshould change. A brief characterisation is:

[argv:string Array]

An array with user arguments from a program call. These are numbered 0,1,2,... corresponding to their position after a (mandatory) double-dash argument (--).

[INTERPRET OF T:T] or: [ _ OF T : T]

Embedded interpretation: reads one expression through the scanner/parser and has it interpreted by dulce.

[ANY OF T:T]

A `match all types' expression that may be substituted for erroneous parts of programs (which are then not interpreted), and used explicitly as a kind of `undefined' value (which actually suppresses propagation of intermediate results, but allows computation to continue `after the next semicolon').
Except for the last, use of one of these operations requires that its signature is present in a language's syntax file.
Some operations may be defined in terms of externally visible routines from dulce. Such possiblities will not be described further. They have been used to define an operation to set and clear variables associated with program call arguments, and to push and pop files from which the scanner reads.

Conventions and behaviours

A Howard language interpreter consists of libraries of language contributions linked with most of dulce. This means that interpreters have common properties of interest, e.g. program call arguments. This subsection describes common attributes of Howard language interpreters as programs.

Program behaviour

A binary Howard language interpreter is identified by a path name like any other program. It has a syntax file with a path name that is the program's extended with .ulc, i.e. the two files must be in the same directory. The syntax file is read and used to initialise internal tables. Afterwards the interpreter changes to another parser, i.e. no addition to the contents of the syntax file will be accepted.
Only one expression is interpreted by a binary interpreter. An iterative behaviour has to be achieved by use of INTERPRET and appropriate (control) operations, e.g. a loop. Termination of an iteration should preferably be achieved with operations, but is of course also possible with usual job-control operations of the operating system. The file name of the interpreter is used as prompt for interactive input, preceeded by as many points as there are nested embedded interpreters.
The actually interpreted expression might be stdout<<Expr when Expr is submitted. This will be the case when the submitted expression has a type suitable to be written, i.e. neither type void nor Output . It implies that type Output, operation [stdout:Output], and the << operator be defined (by some contributor). Except for interpretation, the submitted expression is used unmodified.
Howard language programs may be the contents of an executable script. Comments are texts containing a # character and all those that follow it on the same line, as in other scripting languages. The scanner uses a stack of files as source for expressions, and operations can be defined such that one executable script may interpret another.

Program call arguments

Three classes are distinguished: build options, which are used in the actions described in some Makefile and not described here, behaviour options, which are used to vary the reactions of an interpreter, and user options which are passed to Howard language programs. User options are those that in a program call follows a double-dash option. They are accessible through the argv operation, when its signature is present in the syntax file.
A Howard language interpreter reacts to the following behaviour options:
--debug Display debugging information of various kinds.
--showrep Translate an expression to a form somewhat similar to lambda-notation. It has been used for debugging the interpreter, but may be useful for contributors in a learning context as well.
--showcode Display a written representation of the `code' that is interpreted. The code represents a notion called an abstract context tree which will be described separately (in due time).
--translate Translate an expression to the source language of GNU C.
NB. This is experimental, and perhaps not debugged sufficiently.
A convenient way to use translation is to have subdirectories scripts, C, and bin in a directory with a Makefile that allows
make C/X builds C/X.c from scripts/X
make bin/X builds the executable bin/X from C/X.c
The distribution contains an example hierarchy for Howard languages. A contribution subdirectory has a Makefile which supports these commands.
--nogo Prohibit interpretation (to be used with some other options)
A table in dulce holds information about which options are set. Thus it is possible to define an operation to set and clear these options dynamically.

Error messages

Figures 2.6 and 2.7 classify the possible messages from the parser.
Figure 6: Messages about program errors

Operation applications
** At level ... I see an unexpected argument:
** I cannot handle the incomplete term at level ...: ... Not matched.  
** I see an ambiguity after ... 
   I'll drop it, recover from ... and continue. Missing semicolon?
** I don't know of an object called ... at level ..."

Unknown operator semantics
?? I cannot find any meaning for ... in the context ...
?? I cannot find a meaning for: ... probably due to some free type, sorry

Elementary type conflicts
 >> Type conflict: non-void where void is expected
 >> Simple type conflict: I see type ..., but expected `void'
 >> Simple type conflict: I see type `void', but expected ...
 >> Type conflict: I expected ...; but I see ...

Argument conflicts in type expressions
 >> [checklists] Type conflict: I expected ... but I see ...
 >> problems in argument lists of ... vs. ...
 >> ... is not matched in checklists
 >> ... is not matched

Code generation problems
!! Sorry, but I have problems with the levels.
-- I have problems with type argument ... in ...
-- Sorry: the type ... seems thrown out of its scope in ...

Errors relating to DEF and REC
>> I don't know a type called ... -- I'll introduce it
>> I don't recognise qualifier ..., I'll discard it
>> I'm unable to handle this kind of type parameter: ...
>> An argument is missing for the ... operator\
>> ... arguments missing for the ... operator
>> Accepting ..., but I'll discard: ...
>> I cannot handle field ... in ...
>> bad descriptor in definition: ...


Figure 7: Messages about language definition and internal errors

Internal errors
!! I could find no type ... starting from level ...
!! Problems in restore state, (Term of kind ...).  Please report.
!! I have problems handling the expression.
!! Problems computing level differences, please report.
!! Problem in restore state (Term kind is ...). Report please.

Language definition errors
Sorry, problems in the language definition
Signatures are expected -- Ctl-C exits. ***


Proper syntax errors are identified by program bison. Typically the reaction is a message, possibly including information about text that has to be skipped. A simple example:
demo> { if [ + 4
demo> ) * 5

[2] Syntax: (I will skip to ';') --   ) 
* 
5 
demo> };
} 
;
 -- try again

A line number is given in brackets.
Examples   Arity errors in applications are identified based on checks against a signature: a clause may be missing, or too many may be seen. Quite often these errors are caused by punctuation errors, like omission of a semicolon.
demo> program{"abc"} if(1,2,3);
 ** [line 1] I see an ambiguity after program OF(string) { ... }
    I'll drop it, recover from 'if' and continue? Missing semicolon?
2
demo> y;

    [line 1] I don't know of an object called y at level 1

--> ANY OF(t(?)) : t for all t
     -- which I cannot execute

Type errors may be detected rather late. There may be conflicts between an expected and a seen type, arity problems in type expressions, and conflicts when arguments of type operators are compared. Most error messages are generated in situations where little information about the position in an expression is present, so after perhaps several error messages some position information will be given - including a (tentative) line number.
demo> if (1,2,"abc");

 >> Type conflict: I expected int; but I see string

 == [line 1] Errors found after:
if OF(int) { ... }{ ... }


demo> if ("ab",2,3);

 >> Type conflict: I expected int; but I see string

 == [line 1] Errors found after:
if OF(w(?)) 
    
demo> if (1,"abc",2);

 >> Type conflict: I expected string; but I see int

 == [line 1] Errors found after:
if OF(string) { ... }{ ... }
demo> { var x;
demo>   sets;
demo>     x := set([1,4,5,4,2,6]);
demo>     x
demo> };
 ?? [line 5] I cannot find any meaning for <<
    stdout
     << program OF(int Set[4]) { ... }
    
    in the context
    loop OF(void) 

 == [line 5] Errors found after:
stdout
     << program OF(int Set[4]) { ... }
    
demo> { var x;
demo>   sets;
demo>     x := set([1,4,5,4,2,6]);
demo>     stdout << x << nl        # replacement for stdout << {...}
demo> };
 -- Sorry: I could not find type int Set[4] used in program OF(Output) 
     [line 5]
 -- I have problems with type argument 1 in
       var OF(Output,int Set[4])  [line 1]
demo> { sets;
demo>   var x;                     # now inside the scope of Set
demo>     x := set([1,4,5,4,2,6]);
demo>     stdout << x << nl
demo> };
set([1,2,4,5,6])

Semantics

Regularity of Howard languages has been an important design issue. Their semantics should, hopefully, then be both regular and simple. However, it does not mean that a formal description of the semantics as a consequence is simple and intuitive, because the problems of the formal description language have an influence.
The formalism of denotational semantics causes a minor problem with the notion of call-by-name which is fundamental to us: a call-by-name parameter represents a 'function with a one-element domain' which needs to be handled as a special case.
Another problem is sequencing, in particular when `side effects' are permitted. Since `side effects' typically relate to a state, it may easily lead to models of computations that depend on a global, shared state - which in our opinion is precisely what we want to avoid. In addition, we find that sequencing possibly allows point of recovery from computation of `undefined values', e.g. a divide-by-zero result.

Formalities

Essentially we need a formalism that provide the primitives we want to describe, but unfortunately cannot find one. Still, we may try to express our intentions in terms of the notation of denotational semantics. Disregarding its shortcomings we arrive at a description which is simple, but also needs to be expanded to cover for the notion of labels:
[[ e0(e1,e2,...,en) ]]r = F(A1,A2,...,An)
where 
Ai x
=
[[ei]](r (dix)), i=1,2,...,n
F
=
[[e0]] r 
where  F(f1,f2,...,fn) = mathematical characterisation
The ei, i=1,2,...,n, ranges over expressions, r over environments, i.e. mappings of program names to semantics, di over program name lists (implicitly introduced names), and x over lists of semantics that has to be bound to a corresponding list of program names. The operator combines two mappings of program names into one. A characterisation of environments and should describe an intended name look-up function - and hence lexical scope rules.
We tacitly assume included an anonymous element in every di and a corresponding value of a standard one-element domain in the list of semantics x. Consequently call-by-name is our standard argument passing strategy, and we leave it to operations (with external definitions) to perform actual evaluations.
Sequential evaluation and side-effects are notions not found in conventional mathematics, but fundamental to automated computation. Some computations may loop forever, and hence prevent progress to the next in a sequence. Others may not be started because a preliminary test tells that no useful result can be produced. Should progress also be prevented if that happens in a sequence of computations? Our proposed answer to that is: no!
It hardly makes sense to formalise this, since the interpretation of the formalism must have a mechanism similar to what we want to describe.
[[ e1;e2;...;en ]] r = ([[ e2;...;en ]] r when ([[ e ]]1 r) terminates)
This behaviour is easy to achieve with continuations.

Notational conventions

An application can be expressed as f(e0,e1,...,ek). In addition, one or more arguments may be expressed by a sequence of computations, {u0;u1;...;un}. Its type and value is given by un and it may replace ei in writing as follows
f(e0,e1,...,ei-1){u0;u1;...;un}(ei+1,...,ek)
This covers also the case n=0 in which case the expression again may be enclosed in parentheses, and ultimately admits the extreme cases:
f(e0)(e1)...(ek)
f{e0}{e1}...{ek}
Every application requires a fixed number of arguments, called the arity of the applied operation. However, inside a sequence of expressions a special rule applies
The tail coup: If an operation of arity m is applied to m-1 arguments and followed by a semicolon, then the m-th argument is the rest of the sequence, i.e.
{u0; u1; ... g(v0,v1,...,vm-2); ui; ... un}
{u0; u1; ... g(v0,v1,...,vm-2){ui; ... un}}
A name may be introduced in front of every left brace or parenthesis, possibly followed by a colon. The term label will be used for such names, but labels should not be confused with destinations of gotos which are not supported. A label may be used to enhance readability or as a qualifier for certain names. Where the tail coup is used, a label may also be introduced before the last argument, i.e.  before the semicolon.

Naming conventions

If a label may be used as a qualifier for an operation name __, it may be used by itself to denote that operation.
If an expression, E, has type T and a meaning for an operator symbol `_'(T,NONE):U is introduced along with T (i.e. at the same level of nesting), then (E`_') is generated whenever coercion is permitted. Operator attributes include rules to permit coercion explicitly, and coercion is permitted for all clauses in an application.
Some type identifiers are introduced at the global level by default:
Name Description
int integers
real reals
string texts in citation marks (")
Text texts not in citation marks
void a set with a single, anonymous value
Output output stream values
NONE the type of an absent operand
A few operator symbol are described by default as follows
Symbol Description
? 100 both miss representation of an absent operator symbol
`_' 200 none __ulc_conv dynamic coercion operator
<< to output values (of submitted expressions)
Some expressions are defined in terms of operator symbols that are not described by default, i.e. that must be described in a syntax file and can be overloaded like any other. These are
Symbol Description
:: [e0,e1,...,en] = e0 :: e1 :: ... :: en :: nil
`,' (e0,e1,...,en) = e0 `,' e1 `,' ... `,' en
in contexts where (e0,e1,...,en) is cannot be taken as a sublist of clauses in an application.

Hacks

A hack is defined as a convention that is associated with a certain construct that may not likely be used, unless the convention is known. As an example: who would without hesitation define a routine with a signature like
string(S:string): T
for some type T? The tool takes the liberty of interpreting this as an operation to call implicitly with a string literal as argument, whenever the literal occurs in an expression. It can be useful to construct an `interpretive compiler' for a language: the interpretation of an expression will be some `code' that represents the expression, rather than its usual result as a computation. The hack is then needed to generate `code' corresponding to literal string constants.
Hack 1   When a context contains an operation with signature
basetype(X:basetype):reptype
the operation is used to convert literals of basetype to reptype, for basetype either int, real or string.
Since sequencing is an operation implemented by the tool dulce itself, it can be necessary for an `interpretive compiler' to prescribe some actions at the point where semicolons may occur. This can be achieved by another hack.
Hack 2   When a context contains an operation with signature
`;' (sometype,NONE):newtype
the operation will be called implicitly for every expression of type sometype that is followed by a semicolon in a composite expression.
The INTERPRET operation essentially provides an interpretation context which as a minimum contains the global context. Sometimes a restriction is appropriate, so for this we provide yet another hack, which allows a designer to define a number of named operations, that presumably are characteristic for some application domain, and subsequently restrict a user from access to those not defined explicitly. It can be an intermediate step towards an optimised domain specific language interpreter.
Hack 3   Inside a clause of an operations with name NEW_CONTEXT the collection of available named operations is restricted to those introduced by the operation.

Restriction

A signature may contain type_operators containing names with the signature as its scope. An operation with a signature that contains type_operators may be applied to arguments of different types, as long as the relations expressed by the signature is satisfied, e.g. that both `branches' of an application of if have the identical types, whichever it may be.
Application of an operation with clause described by a signature with type_operators implies that the names are introduced as abstract type operators in the clause, i.e. that introduced operations may have signatures that depend on type expressions that use the new abstract type operators, and that the names may be used in signatures of internal definitions (DEFs).
A name may thus either indicate a variation in types or provide a new type operator. However,
presently an error message is given on expressions that strives to express variation over types of non-zero arity

Contents

Demo language
Implementation tool
Copyrights


Introduction
Principles
Interpreter construction


Contribution directory
Make commands
Semantics
Illustration
Auxiliary files
Toplevel files
Reference
Download
Appendices



File translated from TEX by TTH, version 3.33.
On 18 Oct 2006, 16:47.
SourceForge.net Logo