Dino Documentation

The Programming Language DINO: Predeclared identifiers Next Previous Contents

9. Predeclared identifiers

Dino has quite a lot of predeclared identifiers. They are combined in in a few signleton objects also called spaces -- see the section Declarations and Scope Rules. Most of predeclared identifiers refer for functions. The predeclared functions expect a given number of actual parameters (may be a variable number of parameters). If the actual parameter number is an unexpected one, the exception parnumber is generated. The predeclared functions expect that the actual parameters (may be after implicit conversions) are of the required type. If this is not true, the exception partype is generated. To show how many parameters the function requires, we will write the names of the parameters and use the brackets [ and ] for the optional parameters in the description of the functions.

Examples: The following description


          strtime ([format [, time]])

describes that the function can accept zero, one, or two parameters. If only one parameter is given, then this is parameter format.

If nothing is said about the returned result, the function return value is undefined.

The predeclared identifiers are describe below according to their spaces.

9.1 Space `lang`

The space contains fundamental Dino declarations. All declarations of the space are always exposed.

Predeclared variables

Space lang has some predeclared variables which contain useful information or can be used to control the behaviour of the Dino interpreter.

Arguments and environment

To access arguments to the program and the environment, the following variables can be used:

argv. The variable value is an immutable vector whose elements are strings (immutable vectors of characters) representing the arguments to the program (see the appendix Implementation).
env. The variable value is an immutable table whose elements are strings (immutable vectors of characters) representing values of the environment variables whose names are the keys of the table.

Versions

As Dino is a live programming language, it and its interpreter are in the process of development. To access the Dino interpreter's version number and the language version, the final variables version and lang_version can be used correspondingly. The variable values are the versions as floating point numbers. For example, if the current Dino interpreter version is 0.97 and the Dino language version is 0.5, the variable values will be 0.97 and 0.5.

Threads

To access some information about threads in Dino program, the following variables can be used.

main_thread. The variable value is the main thread. When the program starts, there is only one thread which is called the main thread.
curr_thread. The variable value is the thread in which you reference for the variable.

All these variables are final, so you can not change their values.

Exception classes

All predeclared classes in the space lang describe exceptions which may be generated in a Dino program. All Dino exceptions are represented by objects of the predeclared class except or of a sub-class of the class except. The class except has no parameters. There is only one predeclared sub-class error of the class except. All classes corresponding to user-defined exceptions are suggested to be declared as a sub-class of except. All other exceptions (e.g. generated by the Dino interpreter itself or by predeclared functions) are objects of the class error or predeclared classes which are sub-classes of error. The class error and all its sub-classes has one parameter msg which contains a readable message about the exception. The following classes are declared in the space lang as a sub-class of error:

invop. The following sub-classes of this class describe exceptions when operands of an operation have an incorrect type or value.
- optype. This class describes that an operand of an operation is not of the required type (possibly after implicit conversions).
- opvalue. This class is reserved for the error of that an operand of an operation has invalid value.
invindex. Sub-classes of this class describe exceptions in referring for a vector element.
- indextype. This class describes that the index is not of integer type (possibly after implicit integer conversion).
- indexvalue. This class describes that the index is negative or equal to or more than the vector length.
- indexop. This class describes that the first operand in referring to a vector element is not a vector.
invslice. Sub-classes of this class describe exceptions in referring for a vector slice.
- slicetype. This class describes that the start index, bound, or step is not of integer type (possibly after implicit integer conversion).
- sliceform. This class describes that the slice has a wrong form, e.g. the start index is negative, the step is zero or the slice is applied not to a vector.
invector. Sub-classes inside this class mostly describe exceptions in slice operations.
- veclen. This class describes that operands in a slice operator have different length.
- vecform. This class describes that operands in a slice operator have different dimensions.
- matrixform. This class describes error when a matrix transposition (function transpose) is applied to a vector of different length vectors.
invkey. Sub-classes inside this class describe exceptions in referring to a table element.
- keyvalue. This class describes that there is no such element in the table with the given key when we need the value of the element. The exception does not occur when a table element reference stands in the left hand side of an assignment-statement.
- keyop. This class describes that the first operand in referring to a table element is not a table.
invcall. Sub-classes of this class describe exceptions in calling functions (mainly predeclared ones).
- abstrcall. This class describes that we try to call a declared but not defined function.
- callop. This class describes that we try to call something which is not a function, class, or fiber. The exception is also generated when we try to create a class file instance by calling the class.
- partype. This class describes that a parameter value of a predeclared function is not of the required type.
- parvalue. This class describes that a parameter value of a predeclared function is not one of the permitted values (see functions set_encoding, set_file_encoding).
- parnumber. This class describes that the number of actual parameters is not valid when we call a predeclared function.
- syncthreadcall. This class describes that a fiber call occurs inside a critical region -- see the wait-statement.
- invresult. This class describes that the result value of a function call is not of the required type, e.g. the comparison function used in a call of the function sort returns a non integer value.
- internal. This class describes all other (nonspecified) exceptions in calling predeclared functions.
invaccess. Sub-classes of this class describe exceptions in accessing or changing values.
- accessop. This class describes that a given class declaration can not be found or is private when accessing to it through the corresponding object.
- accessvalue. This class describes that we try to access to a declared but not defined through the corresponding object -- see abstract classes.
- immutable. This class describes that we try to change an immutable value.
- patternmatch. This class describes that the pattern in a variable declaration does not match the assigned value.
deadlock. This class describes that a deadlock is recognized in a multi-threaded program.
syncwait. This class describes that we try to execute a wait-stmt inside a critical region.

Functions of the space `lang`

The following functions are declared in the space lang:

tolower (str). The function expects that the parameter str (after an implicit string conversion) is a string. The function returns a new string str in which upper case letters are changed to the corresponding lower case letters.
toupper (str). The function expects that the parameter str (after an implicit string conversion) is a string. The function returns a new string str in which lower case letters are changed to the corresponding upper case letters.
translit (str, what, subst). The function transliterates charactes in a string. The function expects that the parameters str (after an implicit string conversion), what, and subst are strings. The function returns the new string str in which its characters which are present in what are changed to the corresponding characters in subst. The last two strings should have the same length. The second string may contain more than one occurence of a character. In this case the last correspondence is taken.
eltype (vect). The function expects that the parameter value is a vector. The function returns nil if the vector is heterogenous, otherwise the function returns the type of the vector elements (type of nil if the vector is empty).
keys (tab). The function expects that the parameter value is a table. The function returns a new mutable vector containing all the keys in the table. The order of keys in the vector is undefined.
closure (par). The function accepts any parameter value. If the parameter value is an object or a block instance of a function, the function closure returns the corresponding class or function which contains also its context. That is why it is called a closure. In all other cases, the function returns nil.
context (par). The function returns the context (see the section Declarations and Scope Rules) represented by a block instance or an object for the given parameter value which should be a function, a class, a fiber, a block instance, or an object.
inside (par1, par2, flag = 0). The goal for the function usage is to check that something is declared inside something other.
If the third parameter value after an implicit integer conversion is given and nonzero, it is checked with taking contexts into account. The second parameter value should be a function, class, object, or a block instance. In the last two cases of the second parameter value, the corresponding class, function, or block is used. The first parameter value should be a function, a class, an object, or a block instance. In the last two cases, they define the corresponding function, class, or block.

If the function, class, or block defined by the first parameter is declared inside the function, class, or block given by the second parameter, the function inside returns 1. The function inside also returns 1 if the function, class, or block defined by the first parameter is the same as the function, class, or block given by the second parameter. Otherwise the function inside returns 0. The following example illustrates the difference between checking with taking contexts into account and without it.
class c () { class subc () { } } inside (c ().subc (), c ().subc); // returns 1 inside (c ().subc (), c ().subc, 1); // returns 0
The first call of inside returns 1, while the second one returns 0.
isa (fco, fc). The goal for function usage is to check that a function, a class, or an object given by the first parameter fco uses declarations (through a use-clause) of a function or a class given by the second parameter fc, in other words the first is a subtype of the second (or a sub-class of the class). If it is true, the function returns 1, otherwise it returns zero. If the parameter types are wrong, the function generates the exception partype. The following example illustrates usage of isa.
class c () {} class subc () { use c;} isa (subc, c); isa (subc (), c);
The calls of isa in the example return 1.
subv (vect, index, length = -1). The function is used to extract a sub-vector. The first parameter value should be a vector after an implicit string conversion. The second and third parameter values should be integers after an implicit integer conversion.
The function extracts only an element or the part of the sub-vector existing in the vector (so you can use any values of the index and the length). If the index is negative, it refers to an element anologous to a slice bound. In other words, -1 corresponds to the vector length, -2 corresponds to the vector length-1, -3 corresponds to the vector length-2, and so on. If the length is negative, the sub-vector will finish on the vector end. The function returns a new vector which is the sub-vector. The result vector is immutable only when the original vector is immutable.
del (vect, index, length = 1) or del (tab, key). The first form of the function is used to remove the vector element or a sub-vector from the mutable vector vect. The second and the third parameter values should be integers after an implicit integer conversion.
The function removes only an element or the part of the sub-vector existing in the vector (so you can use any values of the index and the length). A negative index has the same meaning as in subv. If the length is negative, the sub-vector will finish on the vector end.

The second form of the function is used to remove an element (if it exists) with the given key from a mutable table.

The function generates the exception immutable if we are trying to remove from an immutable vector or table. The function returns the modified vector/table.
ins (vect, el, index = -1). The function inserts an element given by the second parameter into a vector given by the first parameter on the place given by the third parameter. The third parameter should be an integer after an implicit integer conversion. Negative index has the same meaning as in subv. The function generates the exception immutable if we are trying to insert into an immutable vector. The function returns the modified vector.
insv (vect, vect, index = -1). The function is analogous to the function ins but it is used for insertion of all vector elements into the vector given as the first parameter. So the second parameter value should be a vector. The function returns the modified vector.
rev (vect). The function returns a reversion of the given vector.
cmpv (vect, vect). The function makes ab implicit string conversion of the parameter values. After that, the parameter values should be vectors whose first corresponding equal elements should have the same type (character, integer, or floating point type). The first corresponding unequal elements should have the same type too (the remaining elements can have different types). As usual, if this is not true, the exception partype is generated.
The function returns 1 if the first unequal element value of the first vector is greater than the corresponding element in the second vector, -1 if less, and 0 if the all corresponding vector elements are equal. If the first vector is a prefix of the second vector, the function returns -1. If the second vector is a prefix of the first vector, the function returns 1, so it uses in fact a generalized lexicographical order.
filter (f, v, d = 1). The function expects function f, vector v, and optional integer d after an integer conversion. Otherwise the exception partype is generated.
The function processes v's elements if d is equal one, elements of vectors which are v's elements if d is equal to 2 and so on. In other words, d is a level on which the vector elements are processed. If v has no structure necessary for processing, the exception vecform is generated. If d is zero or negative, the function just returns v. Otherwise the function creates a new mutable vector having the same structure as v with only elements on level d for which the function f returns nonzero value after an integer conversion.

If the result of function f calls after the integer conversion is not integer, the exception invresult is generated. The following example illustrates an usage of filter.
var i, v = [0, 1, -2, 3, -4]; println (filter (fun (a) {a > 0;}, v)); v = [[0, 1, -2, 3, -4], [5, -6, 7, -8, 9]]; println (filter (fun (a) {a > 0;}, v, 2));
map (f, v, d = 1). The meaning of the function parameters and constraints to their values are analogous to ones of the function filter. Only the function f can return any value. The elements processed by the function f are changed onto the results of function f calls. The following example illustrates usage of map.
var i, v = [[0, 1, -2, 3, -4], [5, -6, 7, -8, 9]]; println (map (fun (a) {a < 0 ? nil : a;}, v, 2));
fold (f, v, init, d = 1). The meaning of function parameters f, v, and d and constraints to their values are analogous to ones of the function filter. The function processes all elements of the vectors on level d and returns value f (f (f (f (init, el0), el1), ...) , eln) where el0, ..., eln are vector elements on level d taken from left to right. If d is zero or negative or the vectors are empty, the function returns init. The following example illustrates usage of fold.
var v = [1,2,3,4]; println (fold (fun (a, b) {a + b;}, v, 0));
sort (vect[, compare_function]). The function returns a new sorted vector. The original vector given as the first parameter value should be a homogeneous vector whose elements are of character, integer, long integer, or floating point type. If the second parameter is not given, the standard arithmetic order (see the comparison operators) is used. To use a special ordering, use the second parameter which should be a function which compares two elements of the vector and returns a negative integer if the first parameter value (element) is less than the second one, a positive integer if the first parameter value is greater than the second one, and zero if they are equal.
transpose (m). The function expects matrix m. It means that m should be a vector (each element is a matrix row) of vectors of equal length. If m is not a vector, the exception partype is generated. If the elements of m are not vectors of the same length, the exceptions matrixform is generated. The function returns a new mutable vector of mutable vectors which is a matrix transposition of m.
gc (). The function forces a garbage collection and heap compaction. Usually the Dino interpreter itself invokes a garbage collection when it believes that it needs to this.
exit (code). The function finishes the work of the interpreter with the given code which should be an integer value after an implicit integer conversion.

9.2 Space `io`

The space contains functions for input and output and for work with files and directories. All declarations of the space are always exposed.

Exception classes of the space `io`

The following classes are declared in the space io as sub-classes of invcall:

invinput. This class describes that the file input is not of the required format. Usually the exception is generated by the function scan etc.
invfmt. This class describes that a format of a format output function is wrong (see the function putf).
eof. This class describes that the end of file is encountered. Usually the exception is generated by functions reading files (get, scan etc).
invencoding. This class describes different exceptions with the used encodings, e.g. a file contains bytes not corresponding to the expected encoding or in some cases the encoding should contain ASCII characters.

Class `file`

Dino has a predeclared final class file. Work with files in a Dino program are made through objects of the class. All declarations inside of the class are private. The objects of the class can be created only by the predeclared functions open or popen. If you create an object of the class by calling the class, the exception callop will be generated. The file encoding is defined by the current DINO encoding at the file creation time (see thefunctions set_encoding, set_file_encoding). If you want to work with files on the byte level without any encoding/decoding, you can use an encoding called "RAW".

Files

To output something into the standard output streams or to input something from the standard input stream, the following variables can be used:

stdin. The variable value is an object of the class file which corresponds to the standard input stream.
stdout. The variable value is an object of the class file which corresponds to the standard output stream.
stderr. The variable value is an object of the class file which corresponds to the standard error stream.

All these variables are final, so you can not change their values. Encoding of the files is DINO current encoding at the program start (see the function set_encoding).

Functions for work with files

The following functions (besides the input/output functions) work with OS files. The functions may generate an exception declared in the class syserror (e.g. eaccess, enametoolong, eisdir and so on) besides the standard partype, and parnumber. The function rename can be used for renaming a directory, not only a file.

rename (old_path, new_path). The function renames the file (directory) given by its path name. The old and new names are given by the parameter values which should be strings after an implicit string conversion.
remove (file_path). The function removes the OS file given by its path name. The file path name should be a string after an implicit string conversion.
open (file_path, mode). The function opens the file for work in the given mode, creates a new class file instance, associates the opened file with the instance, and returns the instance. The parameter values should be strings after an implicit string conversions. The first parameter value is a string representing the file path. The second parameter value is a string representing the mode for work with the file (for all possible modes see the ANSI C function fopen documentation). All work with the opened file is made through the file instance.
close (fileinstance). The function closes a file opened by the function open. The file is given by the class file instance. The function also removes all association of the instance with the file.
flush (fileinstance). The function flushes any output that has been buffered for the opened file given by the class file instance.
popen (command, mode). The function starts a shell command given by the first parameter value (which should be a string after an implicit string conversion), creates a pipe, creates a new class file instance, associates the pipe with the instance, and returns the instance. Writing to such a pipe (through the class file instance) writes to the standard input of the command. Conversely, reading from the pipe reads the command's standard output. After an implicit string conversion the second parameter value should be the string "r" (for reading from the pipe) or "w" (for writing to the pipe). The pipe should be closed by the function pclose.
pclose (fileinstance). The function waits for the command connected to a pipe to terminate. The pipe is given by the class file instance returned by the function popen. The function also removes the association of the instance with the pipe.
tell (fileinstance). The function returns the current value of the file position indicator for the file (opened by function open) given by the class file instance.
seek (fileinstance, offset, whence). The function sets up the current file position indicator for the file (opened by function open) given by the class file instance. The position is given by offset which should be an integer after an implicit arithmetic conversion and whence which should be a string after an implicit string conversion. The first character of the string should be 's', 'c', or 'e' (these characters mean that the offset is relative to the start of the file, the current position indicator, or the end-of-file, respectively).
get_file_encoding (fileinstance). The function returns a new mutable string which is a name of the current file encoding.
set_file_encoding (fileinstance, name). The function accepts a file and a string and changes the current file encoding. If the name represents an unknown encoding name, the function generates the exception parvalue.

File output functions

The following functions are used to output something into opened files. All the function return values are undefined. The functions may generate an exception declared in the class syserror (e.g. eio, enospc and so on) besides the standard partype and parnumber.

put (...). All parameters should be strings after an implicit string conversion. The function outputs all strings into the standard output stream.
putln (...). The function is analogous to the function put except for the fact that it additionally outputs a new line character after output of all the strings.
fput (fileinstance, ...). The function is analogous to the function put except for the fact that it outputs the string into an opened file associated with a class file instance which is the first parameter value.
fputln (fileinstance, ...). The function is analogous to function fput except for the fact that it additionally outputs a new line character after output of all the strings.
putf (format, ...). The first parameter should be a string after an implicit string conversion. The function outputs the rest of parameters according to the format. The number of the rest parameters should be exactly equal to the conversions (including parameterized widths and precisions) in the format. Otherwise, the exception parnumber will be generated. The types of the parameter should correspond to the corresponding conversion specifier (or to be an integer for parameterized widths and precisions). If it is not true, the exception partype will be generated. The format is mostly a subset of one of standard C function printf but it can also deal with multi-precision integers (of the Dino type long). The format has the following syntax:
format : <any character except %> | '%' flags [width] [precision] conversion_specifier flags : | flag flag : '#' | '0' | '-' | ' ' | '+' width : '*' | <a decimal number starting with non-zero> precision : '.' ['*' | <decimal number>] conversion_specifier : 'd' | 'o' | 'x' | 'X' | 'e' | 'E' | 'f' | 'g' | 'G' | 'c' | 's' | '%'
If the format syntax is wrong, the exception invfmt is generated.

The flag '#' means that the value should be converted into an alternative form. It can be present only for the conversion specifiers 'o', 'x', 'X', 'e', 'E', 'f', 'g', and 'G'. If the flag is used for the conversion specifier 'o', the output will be prefixed by '0'. For 'x' and 'X' the output will be prefixed by '0x' and '0X' correspondingly. For the conversions 'e', 'E', 'f', 'g', and 'G' the output will always contain a decimal point. For the conversions 'g' and 'G' it also means that trailing zeros are not removed from the output as they would be without the flag. The following code using the flag '#' in a format
putf ("->%#o %#x %#x %#.0e %#.0f %#g<-\n", 8, 10, 16l, 2., 3., 4.);
will output
->010 0xa 0x10 2.e+00 3. 4.00000<-
The flag '0' means that the output value will be zero padded on the left. If both flags '0' and '-' appear, the flag '0' is ignored. It is also ignored for the conversions 'd', 'o', 'x', and 'X' if a precision is given. The flag is prohibited for the conversions 'c' and 's'. The following code using the flag '0' in a format
putf ("->%04d %04x %04x %09.2e %05.2f %05.2g<-\n", 8, 10, 16l, 2., 3., 4.);
will output
->0008 000a 0010 02.00e+00 03.00 00004<-
The flag '-' means that the output will be left adjusted on the field boundary. (The default is a justification to the right). The flag '-' overrides the flag '0' if the both are given. The following code using the flag '-' in a format
putf ("->%-04d %-04x %-04x %-09.2e %-05.2f %-05.2g<-\n", 8, 10, 16l, 2., 3., 4.);
will output
->8 a 10 2.00e+00 3.00 4 <-
The flag ' ' means that the output of a signed number will start with a blank for positives number. The flag can be used only for the conversions 'd', 'e', 'E', 'f', 'g', and 'G'. If both flags ' ' and '+' appear, the flag ' ' is ignored. The following code using the flag ' ' in a format
putf ("->% d % d % .2e % .2f % .2g<-\n", 8, 16l, 2., 3., 4.);
will output
-> 8 16 2.00e+00 3.00 4<-
The flag '+' means that the output of a signed number will start with a plus for a positives number. The flag can be used only for the conversions 'd', 'e', 'E', 'f', 'g', and 'G'. The flag '+' overrides the flag ' ' if both are given. The following code using the flag '+' in a format
putf ("->%+d %+d %+.2e %+.2f %+.2g<-\n", 8, 16l, 2., 3., 4.);
will output
->+8 +16 +2.00e+00 +3.00 +4<-
The width defines a minimum width of the output value. If the output is smaller, it is padded with spaces (or zeros -- see the flag '0') on the left (if the flag '-' is used) or on the right. The output is never truncated. The width should be no more than maximal integer value, otherwise teh exception invfmt is generated. The width can be given as a parameter of the integer type if '*' is used. If the value of the width given by the parameter is negative, the flag '-' is believed to be given and the width is believed to be equal to zero. The following code using the width in a format
putf ("->%5d %05d %-5d %5d %*d %*d<-\n", 8, 9, 10, 16l, 5, 8, -5, 10);
will output
-> 8 00009 10 16 8 10 <-
The precision is prohibited for the conversion 'c'. If the number after the period is absent, its value will be zero. The precision can be given as a parameter of the integer type if '*' is used after the period. If the value of precision given by the parameter is negative, its value is believed to be zero too. For the conversions 'd', 'o', 'x', and 'X' the precision means a minimum number of the output digits. For the conversions 'e', 'E', and 'f' it means the number of the digits to appear after the decimal point. For 'g' and 'G' it means the maximum number of significant digits. For 's' it means the maximum number of characters to be output from a string. The following code using precisions in a format
putf ("->%.d %.0d %.5d %.d %.0f %.0e %.2g<-\n", 8, 8, 9, 16l, 2.3, 2.3, 3.53); putf ("->%.2s %.0d %.*d %.*d %.*d<-\n", "long", 0, 5, 8, -5, 8, 5, 16l);
will output
->8 8 00009 16 2 2e+00 3.5<- ->lo 00008 8 00016<-
The conversion 'd' should be used to output integer or long integer. The default precision is 1. When 0 is output with an explicit precision equal to 0, the output is empty.

The conversions 'o', 'x', and 'X' should be used to output an integer or long integer value as an unsigned in the octal and hexadecimal form. The lower case letters abcdef are used for 'x' and the upper case letters ABCDEF are used for 'X'. The precision gives the minimum number of digits that must appear. If the output value requires fewer digits, it is padded on the left with zeros. The default precision is 1. When 0 is output with an explicit precision equal to 0, the output is empty.

The conversion 'f' should be used to output floating point values. The output value has a form [-]ddd.ddd where the number of digits after the decimal point is given by the precision specification. The default precision value is 6. If the precision is explicitly zero, no decimal-point character appears.

The conversions 'e' and 'E' should be used to output floating point values with an exponent in the form [-]d.ddd[e|E][+|-]dd. There is always one digit before the decimal-point. The number of digits after the decimal point is defined by the precision. The default precision value is 6. If the precision is zero, no decimal-point appears. The conversion 'E' uses the letter E (rather than e) to introduce the exponent. The exponent always contains at least two digits. If the exponent value is zero, the exponent is output as 00.

The conversions 'g' and 'G' should be used to output floating point values in the style 'f' or 'e' (or 'E' for conversion 'G'). The precision defines the number of significant digits. The default value of the precision is 6. If the precision is zero, it is treated as 1. The conversion 'e' is used if the exponent from the conversion is less than -4 or not less than the precision. Trailing zeros are removed from the fractional part of the output. If all fractional part is zero, the decimal point is removed too.

The conversion 'c' should be used to output a character value.

The conversion 's' should be used to output strings.

The conversion '%' should be used to output %.

The following code using different conversions in a format
putf ("->%% %c %s %d %o %x %X %d %o %x %X<-\n", 'c', "string", 7, 8, 20, 20, 8l, 9l, 21l, 21l); putf ("->%f<-\n", 1.5); putf ("->%e %E %g %G %g %G<-\n", 2.8, 2.8, 3.7, 3.7, 455555555.555, 5.9e-5);
will output
->% c string 7 10 14 14 8 11 15 15<- ->1.500000<- ->2.800000e+00 2.800000E+00 3.7 3.7 4.55556e+08 5.9E-05<-
fput (fileinstance, format, ...). The function is analogous to the function putf except for the fact that it outputs the operands into an opened file associated with a class file instance which is the first parameter value.
print (...). The function outputs all parameter values into the standard output stream. The function never makes an implicit conversions of the parameter values. The parameter values are output as they could be represented in Dino itself (e.g. character 'c' is output as 'c', vector ['a', 'b', 'c'] is output as "abc", vector [10, 20] as [10, 20] and so on). As you know some values (functions, classes, block instances, class instances, threads) are not represented fully in DINO. Such values are represented schematically. For example, the output fun f {}.g(unique_number) would mean the function f in the call of function (or class) g with the given unique number and the function g is in the instance of the implicit block covering the whole program. For the function g, output would look simply like fun g because there is only one instance of the implicit block covering the whole program. Output for an instance of the class c in the function f looks like instance {}.f(unique_number).c(unique_number). Output for a block instance of the function f looks like stack {}.f(unique_number). Output for a thread whose fiber t is declared in the function f would look like thread unique_number {}.f(unique_number).t(unique_number).
println (...). The function is analogous to the function print except for the fact that it additionally outputs a new line character after output of all parameters.
fprint (fileinstance, ...). The function is analogous to the function print except for the fact that it outputs the parameters into an opened file associated with a class file instance which is the value of first parameter.
fprintln (fileinstance, ...). The function is analogous to function fprint except for the fact that it additionally outputs a new line character after the output of all the parameters.

File input functions

The following functions are used to input something from opened files. The functions may generate an exception declared in the class syserror (e.g. eio, enospc and so on) or eof besides the standard partype, and parnumber.

get (). The function reads one character from the standard input stream and returns it. The function generates the exception eof if the function tries to read the end of file.
getln (). The function reads one line from the standard input stream and returns it as a new string. The end of line is the newline character or end of file. The returned string does not contain the newline character. The function generates the exception eof only when the file position indicator before the function call stands exactly at the end of file.
getf ([ln_flag]). The function reads the whole standard input stream and returns it as a new string. The function generates the exception eof only when the file position indicator before the function call stands exactly at the end of file. The function has an optional parameter which should be integer after an implicit integer conversion. If the parameter value is nonzero, the function returns a vector of strings. Otherwise it behaves as usually. Each string is a line in the input stream. The strings do not contain the newline characters.
fget (fileinstance). The function is analogous to the function get except for the fact that it reads from an opened file associated with the class file instance which is the parameter's value.
fgetln (fileinstance). The function is analogous to the function getln except for the fact that it reads from an opened file associated with a class file instance which is the parameter value.
fgetf (fileinstance [, ln_flag]). The function is analogous to the function getf except for the fact that it reads from an opened file associated with a class file instance which is the parameter's value.
scan (). The functions reads a character, integer, floating point number, string, vector, or table and returns it as the result. The input values should be represented in the file as the ones in the Dino language (except for the fact that there should be no identifiers in the input values and there should be no operators in the values, although the signs + and - are possible in an integer or floating point represenation). The table or vector should contains only values of the types mentioned above. The values in the file can be separated by white characters. If there is an error (e.g. unbalanced brackets in a vector value) in the read value representation the function generates the exception invinput. The functions generates the exception eof if only white characters are still unread in the file.
scanln (). The function is analogous to the function scan except for the fact that it skips all characters until the end of line or the end of file after reading the value. Skipping is made even if the exception invinput is generated.
fscan (fileinstance). The function is analogous to the function scan except for the fact that it reads from an opened file associated with a class file instance which is the parameter's value.
fscanln (fileinstance). The function is analogous to the function scanln except for that it reads from an opened file associated with a class file instance which is the parameter value.

Encoding functions

Dino internally uses Unicode for characters. To provide a communication with the rest of world, it can use different encodings. The default encoding is UTF-8. Dino has two functions to get and change the current encoding:

get_encoding (). The function returns a new mutable string which is a name of the current encoding.
set_encoding (name). The function accepts a string and changes the current encoding. If the name represents an unknown encoding name, the function generates the exception parvalue.

Examples:


          putln (get_encoding ());
          set_encoding ("KOI8-R");

Functions for work with directories

The following functions work with directories. The functions may generate an exception declared in the class syserror (e.g. eaccess, enametoolong, enotdir and so on) besides the standard partype and parnumber.

readdir (dirpath). The function makes an implicit string conversion of the parameter value which should be a string (representing a directory path). The function returns a new mutable vector with elements which are strings representing the names of all files and sub-directories (including "." and ".." for the current and parent directory respectively) in given directory.
mkdir (dirpath). The function creates a directory with the given name represented by a string (the parameter value after an implicit string conversion). The directory has read/write/execute rights for all. You can change it with the aid of the functions ch*mod.
rmdir (dirpath). The function removes the directory given by a string which is a parameter value after an implicit string conversion.
getcwd (). The function returns a new string representing the full path of the current directory.
chdir (dirpath). The function makes the directory given by dirpath (which should be a string after an implicit string conversion) the current directory.

Functions for access to file/directory information

The following predeclared functions can be used for accessing file or directory information. The functions may generate an exception declared in the class syserror (e.g. eaccess, enametoolong, enfile and so on) besides the standard partype and parnumber. The functions expect one parameter which should be a file instance (see the predeclared class file) or the path name of a file represented by a string (the functions make an implicit string conversion of the parameter value). The single exception to this is isatty which expects a file instance.

ftype (fileinstance_or_filename). The function returns one the following characters:
- 'f'. A regular file.
- 'd'. A directory.
- 'L'. A symbolic link.
- 'c'. A character device.
- 'b'. A block device.
- 'p'. A fifo.
- 'S'. A socket.
Under some OSes the function never returns some of the characters (e.g. 'c' or 'b'). The function may return nil if it can not categorize the file as above.
fuidn (fileinstance_or_filename). The function returns a new string representing a name of the owner of the file (directory). Under some OSes the function may return the new string "Unknown" if there is no notion "owner" in the OS file system.
fgrpn (fileinstance_or_filename). Analogous to the previous function except for it returns a new string representing a name of the group of the file (directory). Under some OSes the function may return the new string "Unknown" if there is no notion "group" in the OS file system.
fsize (fileinstance_or_filename). The function returns an integer value which is the length of the file in bytes.
fatime (fileinstance_or_filename). The function returns an integer value which is time of the last access to the file (directory). The time is measured in seconds since the fixed time (usually since January 1, 1970). See also time functions.
fmtime (fileinstance_or_filename). Analogous to the previous functions but returns the time of the last modification.
fctime (fileinstance_or_filename). Analogous to the previous functions but it returns the time of the last change. Here `change' usually means changing the file attributes (owner, modes and so on), while `modification' means usually changing the file itself.
fumode (fileinstance_or_filename). The function returns a new string representing the rights of the owner of the file (directory). The string may contain the following characters (in the following order if the string contains more than one character):
- 's'. Sticky bit of the file (directory).
- 'r'. Right to read.
- 'w'. Right to write.
- 'x'. Right to execute.
fgmode (fileinstance_or_filename). Analogous to the previous function except for the fact that it returns information about the file (directory) group user rights and that the function never returns a string containing the character 's'.
fomode (fileinstance_or_filename). Analogous to the previous function except for the act that it returns information about the rights of all other users.
isatty (fileinstance). The function returns 1 if the file instance given as the parameter is an open file connected to a terminal and 0 otherwise.

The following functions can be used to change the rights of usage of the file (directory) for different users. The function expects two strings (after an implicit string conversion). The first one is the path name of the file (directory). The second one is the rights. For instance, if the string contains the character 'r', this is a right to read (see characters used to denote different rights in the description of the function fumode). The function return values are always undefined.

chumod (path, mode). The function sets up rights for the file (directory) owner according to the given mode.
chgmod (path, mode). Analogous to the previous function except for the fact that it sets up rights for the file (directory) group users and that the function ignores the character 's'.
chomod (path, mode). Analogous to the previous function except for the fact that it sets up rights for all other users.

Miscellaneous functions

There are the following miscellaneous functions in space io:

sput (...), sputln (...), sputf (format, ...) The functions are analogous to the functions put, putln, print, and println but they return the result string instead of output of the formed string into the standard output stream.
sprint (...), sprintln (...). The functions are analogous to the functions print and println but they return the result string instead of output of the formed string into the standard output stream.

9.3 Space `sys`

This space contains declarations to work with the underlying execution environment (OS) and related exceptions.

Exceptions in space `sys`

The space contains a lot of exceptions:

signal. This class is a sub-class of the class error. Sub-classes of the class signal describe exceptions from receiving a signal from other OS processes. They are
- sigint. This class describes the exception generated by the user's interrupt from the keyboard.
- sigill. This class describes the exception generated by illegal execution of an instruction .
- sigabrt. This class describes the exception generated by the signal abort.
- sigfpe. This class describes a floating point exception.
- sigterm. This class describes the exception generated by the termination signal.
- sigsegv. This class describes the exception generated by an invalid memory reference.
invenv. This class is a sub-class of the class error. The class invenv describes a corruption of the Dino program environment (see the predeclared variable env).
syserror. This class is a sub-class of the class invcall. Sub-classes of the class syserror describe exceptions in predeclared functions which call OS system functions. Some exceptions are never generated but may be generated in the future on some OSes.
- eaccess. This describes the system error "Permission denied".
- eagain. This describes the system error "Resource temporarily unavailable".
- ebadf. This describes the system error "Bad file descriptor".
- ebusy. This describes the system error "Resource busy".
- echild. This describes the system error "No child processes".
- edeadlk. This describes the system error "Resource deadlock avoided".
- edom. This describes the system error "Domain error".
- eexist. This describes the system error "File exists".
- efault. This describes the system error "Bad address".
- efbig. This describes the system error "File too large".
- eintr. This describes the system error "Interrupted function call".
- einval. This describes the system error "Invalid argument".
- eio. This describes the system error "Input/output error".
- eisdir. This describes the system error "Is a directory".
- emfile. This describes the system error "Too many open files".
- emlink. This describes the system error "Too many links".
- enametoolong. This describes the system error "Filename too long".
- enfile. This describes the system error "Too many open files in system".
- enodev. This describes the system error "No such device".
- enoent. This describes the system error "No such file or directory".
- enoexec. This describes the system error "Exec format error".
- enolck. This describes the system error "No locks available".
- enomem. This describes the system error "Not enough space".
- enospc. This describes the system error "No space left on device".
- enosys. This describes the system error "Function not implemented".
- enotdir. This describes the system error "Not a directory".
- enotempty. This describes the system error "Directory not empty".
- enotty. This describes the system error "Inappropriate I/O control operation".
- enxio. This describes the system error "No such device or address".
- eperm. This describes the system error "Operation not permitted".
- epipe. This describes the system error "Broken pipe".
- erange. This describes the system error "Result too large".
- erofs. This describes the system error "Read-only file system".
- espipe. This describes the system error "Invalid seek".
- esrch. This describes the system error "No such process".
- exdev. This describes the system error "Improper link".
systemcall. This is a sub-class of the class invcall. Sub-classes of the class systemcall describe exceptions in calling the predeclared function system.
- noshell. This class describes the exception that the function system can not find the OS command interpreter (the shell).
- systemfail. This class describes all remaining exceptions in calling the OS function system.
invextern. This is a sub-class of the class invcall. Sub-classes of the class invextern describe exceptions in calling external functions or in accessing an external variable.
- noextern. This class describes the exception that the given external can not be found.
- libclose. This class describes the exception that there is an error in closing a shared library.
- noexternsupp. This class describes an exception in the usage of external objects when they are not implemented under this OS.
- compile. This class describes an exception in a compilation of C code or loading the result shared object file.
invenvar. This is a sub-class of the class invcall. The class invenvar describes corruption in the type of variables split_regex and time_format (e.g. their values are not strings).

Variable `time_format`

The variable value is a string which is the output format of time used by the function strtime when it is called without parameters. The initial value of the variable is the string "%a %b %d %H:%M:%S %Z %Y".

Time functions

The following functions from the space sys can be used to get information about real time.

time (). The function returns the time in seconds since the fixed time (usually since January 1, 1970).
strtime ([format [, time]]). The function returns a string representing the time (an integer representing time in seconds since the fixed time) according to the format (a string). If the format is not given, the value of the variable time_format is used. In this case if the value of time_format is corrupted (it is not a string), the function generates the exception invenvar. If the time is not given, the current time is used. The format is the same as in C library function strftime. Here is an extraction from the OS function documentation. The following format specifiers can be used in the format:
- %a - the abbreviated weekday name according to the current locale.
- %A - the full weekday name according to the current locale.
- %b - the abbreviated month name according to the current locale.
- %B - the full month name according to the current locale.
- %c - the preferred date and time representation for the current locale.
- %d - the day of the month as a decimal number (range 01 to 31).
- %H - the hour as a decimal number using a 24-hour clock (range 00 to 23).
- %I - the hour as a decimal number using a 12-hour clock (range 01 to 12).
- %j - the day of the year as a decimal number (range 001 to 366).
- %m - the month as a decimal number (range 01 to 12).
- %M - the minute as a decimal number.
- %p - either `am' or `pm' according to the given time value, or the corresponding strings for the current locale.
- %S - the second as a decimal number.
- %U - the week number of the current year as a decimal number, starting with the first Sunday as the first day of the first week.
- %W - the week number of the current year as a decimal number, starting with the first Monday as the first day of the first week.
- %w - the day of the week as a decimal, Sunday being 0.
- %x - the preferred date representation for the current locale without the time.
- %X - the preferred time representation for the current locale without the date.
- %y - the year as a decimal number without a century (range 00 to 99).
- %Y - the year as a decimal number including the century.
- %Z - the time zone or the name or an abbreviation.
- %% - the character '%'.

Functions for access to information about OS processes

The space sys contains predeclared functions which are used to get information about the current OS process (the Dino interpreter which executes the program). Each OS process has unique identifier and usually the OS processes are called by a concrete user and group and are executed on behalf of the concrete user and group (so called effective identifiers). The following functions return such information. On some OSes the function may return string "Unknown" as a name if there are no notions of user and group identifiers.

getpid (). The function returns an integer value which is the process ID of the current OS process.
getun (). The function returns a new string which is the user name for the current OS process.
geteun (). The function returns a new string which is the effective user name for the current OS process.
getgn (). The function returns a new string which is the group name for the current OS process.
getegn (). The function returns a new string which is the effective group name for the current OS process.
getgroups (). The function returns a new vector of strings (possibly the empty vector) representing supplementary group names for the current OS process.

Function `system (command)`

The function executes the command given by a string (the parameter value) in the OS command interpreter. Besides the standard exceptions parnumber and partype the function may generate the exceptions noshell and systemfail.

9.4 Space `re`

This space contains declarations which can be useful for working with the regular expressions and for pattern matching -- see also the match-statements.

Exception class `invregex`

This class describes exceptions specific for executing the pmatch-statement and for calling predeclared functions implementing regular expression pattern matching. Although there is only one class for this, the messages which are in the class parameter can be different and explain more details.

Variable `split_regex`

The variable value is a string which represents a regular expression which is used by the predeclared function split when the second parameter is not given. The initial value of the variable is the string "[ \t]+".

Pattern matching

The space re contains predeclared functions which are used for pattern matching. The pattern is described by regular expressions (regex) and actually a small program describing a string matching. The pattern has default syntax of ONIGURUMA package for Unicode. It is hard to describe formally the pattern syntax. Here is an incomplete strict description. For the full reference, please see OINGURUMA package documentation. The regular expressions have the following syntax:


          Regex = Branch {"|" Branch}

The regex matches anything that matches one of the branches.


          Branch = {Piece}

The branch matches the first piece, followed by the second piece, etc. If the pieces are omitted, the branch matches the null string.


          Piece = Anchor | Unit

          Unit = Atom
               | Unit Quantifier

          Quantifier = Greedy
                     | Reluctant
                     | Possesive

          Greedy = "?"                 // 0 or 1 times
                 | "*"                 // 0 or more times
                 | "+"                 // 1 or more times
                 | Bound

          Bound = "{" Min "," Max "}" // from Min to Max times
                | "{" Min "," "}"     // at least Min times
                | "{" "," Max "}"     // equivalent to {0, Max}
                | "{" Min "}"         // given number times

          Reluctant = "??"
                    | "*?"
                    | "+?"
                    | Bound "?"

          Possesive : "?+"
                    | "*+"
                    | "++"

          Min = <unsigned integer>

          Max = <unsigned integer>

The unit followed by * matches a sequence of 0 or more matches of the unit. An unit followed by + matches a sequence of 1 or more matches of the unit. An unit followed by ? matches a sequence of 0 or 1 matches of the unit.

There is a more general construction (a bound) for describing repetitions of an unit. An unit followed by a bound containing only one integer Min matches a sequence of exactly Min matches of the unit. An unit followed by a bound containing one integer Min and a comma matches a sequence of Min or more matches of the unit. An unit followed by a bound containing a comma and one integer Max matches at most Max repetitions of the unit. An unit followed by a bound containing two integers Min and Max matches a sequence of Min through Max (inclusive) matches of the unit.

The described above qualifiers are greedy ones. A gready qualifier first matches as much as possible and can back-track in a case of the whole regex matching failure to try shorter sequence. There are reluctant qualifiers too. They have additional suffix ? and first they match as little as possible. The last type of the qualifiers is possesive. Such qualifiers have additional suffix + and behave like the corresponding greedy ones, but they do not back-track.

Examples:


          `.?foo` // matches first "xfoo" in "xfooxxxxfoo"
          `.*foo` // matches all "xfooxxxxfoo"
          `.+foo` // matches all "xfooxxxxfoo"
          `.{1,8}foo` // matches all "xfooxxxxfoo"
          `.*?foo` // matches first "xfoo" in "xfooxxxxfoo"
          `.+?foo` // Ditto
          `.{1,8}?foo` // Ditto
          `.*+foo` // fail to match in "xfooxxxxfoo"
          `.++foo` // fail to match in "xfooxxxxfoo"


         Atom =  Anchors
               | Character
               | CharacterType
               | CharacterProperty
               | CharacterClass
               | Group
               | BackReference
               | SubexpCall
         
          Character = "\t"     // horizontal tab (0x09)
                    | "\v"     // vertical tab (0x0B)
                    | "\n"     // newline tab (0x0A)
                    | "\r"     // return (0x0D)
                    | "\f"     // form feed (0x0C)
                    | "\a"     // bell (0x07)
                    | "\e"     // escape (0x1B)
                    | "\" OctalCode // char with given octal code
                    | "\x" HexCode  // char with given hexadecimal code
                    | <any but special character \ ? * + ^ $ [ ( ) >
                    | "\" <special character>
      
          OctalCode = <3 octal digits>
        
          HexCode = <2 heaxadecimal digits>
          
          CharacterType = '.'  // any character but newline
                        | "\w" // Unicode Letter, Mark, Number, or
                               //   Connector_Punctuation
                        | "\W" // opposite to the above 
                        | "\s" // Unicode Line_Separator, 
                               //   Paragraph_Separator, or
                               //   Space_Separator
                        | "\S" // opposite to the above 
                        | "\d" // Unicode decimal number 
                        | "\D" // opposite to the above 
                        | "\h" // hexadecimal digit char [0-9a-fA-F] 
                        | "\H" // opposite to the above 

          CharacterProperty = "\p{" PropertyName "}"
                            | "\p{^" PropertyName "}"
                            | "\P{" PropertyName "}"

         PropertyName = "Alnum" | "Alpha" | "Blank" | "Cntrl"
                      | "Digit" | "Graph" | "Lower" | "Print"
                      | "Punct" | "Space" | "Upper" | "XDigit"
                      | "Word" | "ASCII"
                      | "Any" | "Assigned" | "C" | "Cc" | "Cf"
                      | "Cn" | "Co" | "Cs" | "L" | "Ll" | "Lm"
                      | "Lo" | "Lt" | "Lu" | "M" | "Mc" | "Me"
                      | "Mn" | "N" | "Nd" | "Nl" | "No" | "P"
                      | "Pc" | "Pd" | "Pe" | "Pf" | "Pi" | "Po"
                      | "Ps" | "S" | "Sc" | "Sk" | "Sm" | "So"
                      | "Z" | "Zl" | "Zp" | "Zs" | "Arabic"
                      | "Armenian" | "Bengali" | "Bopomofo"
                      | "Braille" | "Buginese" |  "Buhid"
                      | "Canadian_Aboriginal" | "Cherokee"
                      | "Common" | "Coptic" | "Cypriot"
                      | "Cyrillic" | "Deseret" | "Devanagari"
                      | "Ethiopic" | "Georgian" |  "Glagolitic"
                      | "Gothic" | "Greek" | "Gujarati"
                      | "Gurmukhi" | "Han" | "Hangul" | "Hanunoo"
                      | "Hebrew" | "Hiragana" | "Inherited"
                      | "Kannada" | "Katakana" | "Kharoshthi"
                      | "Khmer" | "Lao" | "Latin" | "Limbu"
                      | "Linear_B" | "Malayalam" | "Mongolian"
                      | "Myanmar" | "New_Tai_Lue" | "Ogham"
                      | "Old_Italic" | "Old_Persian" | "Oriya"
                      | "Osmanya" | "Runic" | "Shavian" | "Sinhala"
                      | "Syloti_Nagri" | "Syriac" | "Tagalog"
                      | "Tagbanwa" | "Tai_Le" | "Tamil" | "Telugu"
                      | "Thaana" | "Thai" | "Tibetan" | "Tifinagh"
                      | "Ugaritic" | "Yi"

          Anchors = "^"           // beginning of the line
                  | "$"           // end of the line
                  | "\b"          // word boundary
                  | "\B"          // not word boundary
                  | "\A"          // beginning of string
                  | "\Z"          // end of string, or before newline
                                  //   at the end
                  | "\z"          // end of string

The atom can be a character. Some characters has a special meaning in regex (see comments in the character syntax). The rest characters match the same character in the matching string. To match a special character, use \ before the character. Some characters can be represented by a sequence starting with \ (see the syntax comments).

Examples:


          `\t`        // matches "\\t"
          `\x65`      // matches "e"
          `\p{Alpha}` // matches "a"
          `\w`        // matches "a"
          `b$`        // matches "b" in "b\na"

The atom can be an anchor. Matching anchors succeeds only if their positions correspond a specific place at the matching string (see comments in the anchor syntax).

Examples:


          `b$`        // matches "b" in "b\na"
          `abc\Z`     // matches "abc" in "abc"
          `abc\Z`     // matches "abc" in "abc\n"

The atom which is a character type matches a specific class of character (see comments in the character type syntax).

The atom which is a character property matches a specific class of characters. For meaning Alnum - ASCII, please see the corresponding BracketClass. For meaning C - Zs, please see Unicode categories. For meaning Armenian - Yi, please see the Unicode scripts (alphabets). If the property contains p with ^ or P, the match succeeds when the matching character is not of the class.

Examples:


          `\p{Alpha}` // matches "a"
          `\p{ASCII}` // matches ";"


          CharacterClass = "[" Intersections "]"
                         | "[^" Intersections "]"

          Intersections = Set
                        | Intersections "&&" Set

          Set = SetElement
              | Set SetElement
          
          SetElement = ElementChar ["-" ElementChar]
                     | "[:" BracketClass ":]"
                     | "[:^" BracketClass ":]"
                     | CharacterClass

          ElementChar = Character
                      | "\b"       // backspace 0x08

          BracketClass = "alnum"   // Unicode letter, mark,
                                   //   or decimal number
                       | "alpha"   // Unicode letter or mark
                       | "ascii"   // character in range 0 - 0x7f
                       | "blank"   // Unicode space separator
                                   //   or \t (0x09)
                       | "ctrl"    // Unicode control, format,
                                   //   unassigned, private use,
                                   //   or surrogate
                       | "digit"   // Unicode decimal number 
                       | "graph"   // not a space class and not an
                                   //   Unicode control, unassigned,
                                   //   or surrogate
                       | "lower"   // Unicode lower case letter
                       | "print"   // graph or space class
                       | "punct"   // any Unicode punctuation
                       | "space"   // any Unicode separator,
                                   //   \t (0x09), \n (0x0A), \v (0x0B),
                                   //   \f (0x0C), \r (0x0D),
                                   //   or 0x85 (next line)
                       | "upper"   // Unicode upper case letter
                       | "xdigit"  // ascii 0-9, a-f, or a-f
                       | "word"    // Unicode letter, mark, decimal
                                   //   number or punctuation connector

The atom can be a bracket expression which is a list of intersections of character sets separated by && and enclosed in []. If the character class contains ^ right after [, it matches any character which does match the corresponding character class without ^. A set is a sequence of set elements.

The element given by a character denotes the character itself. An element given by two characters in the list separated by - is shorthand for the full range of characters between those two (inclusive) in the sequence of the unicode codes, e.g. [0-9] matches any decimal digit. Besides the usual character representation you can use here also \b which is a backspace representation.

The element given by a bracket class enclosed in [[::]] matches a character from this class (see comments in BracketClass). If character ^ is present right after [[:, the match succeeds if the character is not in this class.

The element can be given by a character class, in other words the character clases can be nested.

If you need to use [, -, or ] as a normal character in a character class, you can use prefix \ for this.

Examples:


          `[[:alpha:]]`  // matches "a"
          `[[[:lower:]]&&[^a-x]]` // matches "y" or "z"

The atom can be a group, a regular expression enclosed in (). There are several types of groups:


          Group = CapturedGroup
                | NonCapturedGroup
                | "(?#" <any characters but )> ")" // a comment
                | "(?" Options ")"
                | Context
          
          Options =
                  | Options Option

          Option = "-" | "i" | "m" | "x"

          CapturedGroup = "(" [Regex] ")"
                        | "(?<" Name ">" [Regex] ")"
 
          Name = <one or more word character>

          NonCapturedGroup = "(?" Options ":" [Regex] ")"
                           | "(?>" [Regex] ")" /* Atomic group */
                           
          Context = "(?=" [Regex] ")" // look ahead
                  | "(?!" [Regex] ")" // negative look ahead
                  | "(?<=" [Regex] ")" // look behind
                  | "(?<!" [Regex] ")" // negative look behind

          BackReference = "\" Number    // back ref. by group
                                        //   number
                        | "\k<" Number ">" // back ref. by group
                                           //   number
                        | "\k<-" Number ">" // back ref. by relative
                                            //   group number
                        | "\k<" Name ">" // back ref. by group name
                        // back ref. by group name and nest level:
                        | "\k<" Name "+" | "-" Number ">"

          Number = <any integer >= 0>

Some groups are captured groups. It means that you can refer the substrings they match (see the back references) or get the start and the end positions of the matched substrings by calling the Dino regex match functions. A captured group may have a name which can be used in the back references or in the subexp calls.

You can place comments not containing ) in regex betweeen (?# and ).

Options without a regex always matches. They just change how matching works. The option i switches on igoring the letter cases during the match. The pption m makes . to match a newline too. The option x switches on ignoring the white spaces as a character atom and permits to add comments starting with # and ending at the end of line. The character - after the corresponding ? has an opposite effect, e.g. it makes a letter case important in matching again etc.

You can define the options in non captured groups. These options affect only this group. Another form of non-captured group is an atomic group. Once regex in an atomic group mathes something, the matching stays the same during back-tracking.

Examples:


          `(?i:ab)`     // matches "Ab"
          `(?x: a a a)` // matches "aaa"
          `(?>.*)c`     // can not match "abc"

The atom can be a context. A context match does not advance the current position in a matching string. A look ahead context succeeds if the corresponding regex matches a sub-string starting from the current position. A look behind context succeeds if the corresponding regex matches a sub-string finishing right before the current position. There are negative forms of the context atom. They succeed when the corresponding regex does not match.

Examples:


          `(?=bcd)bc`   // matches "bc" in "aabcd"
          `(?<=aa)bc`   // matches "bc" in "aabc"

The atom can be a back reference. It refers to the matched string of the corresponding captured group. The captured groups are counted by their left parantheses starting from one going from left to right. The negative number denotes relative order number, in other words, the order is taken starting from the back reference going from right to left. If the captured group has a name, its matched string can be referenced by its name. If several group has the same name, the name in the back reference corresponds to the last such group. You can add a nest level to the name. If the nest level is zero it is the same as named back reference without nested level. A back reference with non-zero nest level never matches.

Examples:


          `(a)\k<1>`     // matches "aa"
          `(?<p>a)\k<p>` // Ditto

The Atom can be a subexp call:


          SubexpCall = "\g<" Name ">"

The subexp call is actually another occurence of the group it refers to. But if the call is in the group it refers, it is a recursive description. Only left recursion is not permitted as this results in never ending recursion.

Examples:


          `(?<p>cd)\g<p>`   // matches "cdcd"
          `(?<p>a|b\g<p>c)` // matches "a", "bac", "bbacc" etc
          `(?<p>a|b\g<p>c)` // wrong left recursion.

There are the following pattern matching functions in the space re:

match (regex, string). The function searches for matching the regular expression regex in the string. The both parameters should be strings after their implicit string conversions.
The matching is made according to the standard POSIX 1003.2: The regular expression matches the substring starting earliest in the string. If the regular expression could match more than one substring starting at that point, it matches the longest. Subexpressions also match the longest possible substrings, subject to the constraint that the whole match be as long as possible, with subexpressions starting earlier in the regular expression taking priority over ones starting later. In other words, higher-level subexpressions take priority over their component subexpressions. Match lengths are measured in characters, not the collating elements. A null string is considered longer than no match at all.

If there is no matching, the function returns the value nil. Otherwise, the function returns a new mutable vector of integers. The length of the vector is 2 * (N + 1) where N is the number of the captured groups. The first two elements are the index of the first character of the substring corresponding to the whole regular expression and the index of the last character matched plus one. The subsequent two elements are the index of the first character of the substring corresponding to the first captured group in the regular expression and the index of the last character plus one, and so on. If there is no matching with a captured group, the corresponding vector elements will have negative values.

Example: The program
println (re.match (`\n()(a)((a)(a))`, "b\naaab"));
outputs
[1, 5, 2, 2, 2, 3, 3, 5, 3, 4, 4, 5]
gmatch (regex, string[, flag]). The function searches for different occurrences of the regular expression regex in string. Both parameters should be strings after their implicit string conversion. The third parameter is optional. If it is present, it should be integer after an implicit integer conversion. If its value is nonzero, the substrings matched by regex can be overlapped. Otherwise, the substrings are never overlapped. If the parameter is absent, the function behaves as its value were zero. The function returns a new mutable vector of integers. The length of the vector is 2 * N where N is number of the found occurrences. Pairs of the vector elements correspond to the occurrences. The first element of the pairs is an index of the first character of substring corresponding to all regular expression in the corresponding occurrences and the second element is an index of the last character plus one. If there is no one occurrence, the function returns nil.
Example: The program
println (re.gmatch (`aa`, "aaaaa")); println (re.gmatch (`aa`, "aaaaa", 1));
outputs
[0, 2, 2, 4] [0, 2, 1, 3, 2, 4, 3, 5]
sub (regex, string, subst). The function searches for substrings matching the regular expression regex in string. All parameters should be string after an implicit string conversion.
If there is no matching, the function returns the value nil. Otherwise, the function returns a new mutable vector of characters in which the first substring matched has been changed to the string subst. Within the replacement string subst, the sequence \n, where n is a digit from 1 to 9, may be used to indicate the text that matched the n'th captured group of the regex. The sequence \0 represents the entire matched text, as does the character &.
gsub (regex, string, subst). The function is analogous to the function sub except for the function searches for all non-overlapping substrings matched with the regular expression and returns a new mutable vector of characters in which all matched substrings have been changed to the string subst.
split (string [, regex]). The function splits string into non-overlapped substrings separated in the input string by strings matching the regular expression. All parameters should be strings after an implicit string conversion. If the second parameter is omitted the value of the predeclared variable split_regex is used instead of the second parameter value. In this case the function may generate the exception invenvar (corrupted value of a predeclared variable).
The function returns a new mutable vector with the elements which are the separated substrings. If the regular expression is the null string, the function returns a new mutable vector with the elements which are strings each containing one character of string.

Examples: The program
println (re.split ("aaa bbb ccc ddd"));
outputs
["aaa", "bbb", "ccc", "ddd"]
The program
println (re.split ("abcdef", ``));
outputs
["a", "b", "c", "d", "e", "f"]

If the regular expression is incorrect, the functions generate the exception invregex with a message explaining the error.

9.5 Space `math`

The space contains mostly mathematical functions.

Mathematical functions

The following functions make an implicit arithmetic conversion of the parameters. After the conversions the parameters are expected to be of integer, long integer, or floating point type. The result is always a floating point number.

sqrt (x). The function returns the square root of x. The function generates the exception edom if x is negative.
exp (x). The function returns e (the base of the natural logarithm) raised to the power of x.
log (x). The function returns the natural logarithm of x. The function generates the exception edom if x is negative or may generate erange if the value is zero.
log10 (x). The function returns the decimal logarithm of x. The function generates the exception edom if x is negative or may generate erange if the value is zero.
pow (x, y). The function returns x raised to the power of y. The function generates exception edom if x is negative and y is not of integral value.
sin (x). The function returns the sine of x.
cos (x). The function returns the cosine of x.
atan2 (x, y). The function returns the arc tangent of the two variables x and y. It is similar to calculating the arc tangent of y / x, except that the signs of both arguments are used to determine the quadrant of the result.

Other space `math` functions

There are the following miscellaneous functions:

max (v1, v2, ...). The function searches for and returns the maximal value in all of its parameters. The parameters should be of integer, long integer, or floating point type after an implicit arithmetic conversion. So the function can return an integer, a long integer, or floating point number depending on the type of the first maximal value after the conversion.
min (v1, v2, ...). The function is analogous to the previous function, but searches for and returns the minimal value.
srand ([seed]). The function sets the parameter value (after an implicit integer conversion) as a seed for a new sequence of pseudo-random integers to be returned by rand. These sequences are repeatable by calling srand with the same seed value. If the parameter is not given, the seed will be the result of calling function time.
rand (). The function returns a pseudo-random floating point value between 0 and 1. If the function srand was not called before, 1 will be used as the seed value.

9.6 Space `yaep`

This space contains declarations to work with Yet Another Earley Parser (YAEP). YAEP is a very powerful tool to implement language compilers, processors, or translators. The implementation of the Earley parser used in Dino has the following features:

It is sufficiently fast and does not require much memory. This is the fastest implementation of the Earley parser which I know. The main design goal was to achieve a speed and memory requirements which are necessary to use it in prototyping compilers and language processors. It can parse 300,000 lines of C code per second on modern computers and allocates about 5MB memory for a 10,000 line C program.
It makes a simple syntax directed translation, so an abstract tree is already the output of the Earley parser.
It can parse input described by an ambiguous grammar. In this case the parse result can be an abstract tree or all possible abstract trees. Moreover, it produces the compact representation of all possible parse trees by using DAG instead of real trees. These features can be used to parse natural language sentences.
It can make a syntax error recovery. Moreover its error recovery algorithms find an error recovery with a minimal number of ignored tokens. It permits an implemention of parsers with very good error recovery and reporting.
It has a fast startup. There is practically no delay between processing of the grammar and the start of parsing.
It has a flexible interface. The input grammar is given by a YACC-like description.
It has a good debugging features. It can print huge amount of information about grammar, parsing, error recovery, translation. You can even get the result translation in a form for a graphic visualization program.

Exception classes of space `yaep`

The space yaep contains the class invparser which is a sub-class of invcall. The following sub-classes of the class invparser describe exceptions specific for the work with YAEP.

invgrammar. This class describes an exception that the Earley parser got a bad grammar, e.g. without rules, with loops in rules, with nonterminals unachievable from the axiom, with nonterminals not deriving any terminal string etc.
invtoken. This class describes an exception that the parser got an input token with unknown (undeclared) code.
pmemory. This class describes an exception that there is not enough memory for the internal parser data.

Class parser

The space yaep has the predeclared final class parser which implements Earley parser. The following public functions and variables are declared in the class parser:

ambiguous_p. This public variable stores information about the last parsing. A nonzero variable value means that during the last parsing on a given inputm the parser found that the grammar is ambiguous. The parser can find this even if you asked for only one parser tree (see the function set_one_parse).
set_grammar (descr, strict_p). This function tunes the parser to given grammar. The grammar is given by the string descr. A nonzero value of the parameter strict_p (after an implicit integer conversion) means more strict grammer checking. In this case, all nonterminals will be checked on their ability to derive a terminal string instead of only checking the axiom for this. The function can generate the exceptions partype (if the parameters have wrong types) or invgrammar if the description is a bad grammar. The function can also generate the exception pmemory if there is no memory for the internal parser data. The description is similiar to the YACC one. It has the following syntax:
file : file terms [';'] | file rule | terms [';'] | rule terms : terms IDENTIFIER ['=' NUMBER] | TERM rule : IDENTIFIER ':' rhs [';'] rhs : rhs '|' sequence [translation] | sequence [translation] sequence : | sequence IDENTIFIER | sequence C_CHARACTER_CONSTANT translation : '#' | '#' NUMBER | '#' '-' | '#' IDENTIFIER [NUMBER] '(' numbers ')' numbers : | numbers NUMBER | numbers '-'
So the description consists of terminal declaration and rule sections.

The terminal declaration section describes the names of terminals and their codes. The terminal code is optional. If it is omitted, the terminal code will the next free code starting with 256. You can declare a terminal several times (the single condition is that its code should be the same).

A character constant present in the rules is a terminal described by default. Its code is always the ASCII code of the character constant.

The rules syntax is the same as YACC rule syntax. The single difference is an optional translation construction starting with # right after each alternative. The translation part could be a single number which means that the translation of the alternative will be the translation of the symbol with the given number (symbol numbers in the alternative start with 0). Or the translation can be empty or `-' which designates the value of the variable nil_anode. Or the translation can be an abstract node with the given name, optional cost, and with the fields whose values are the translations of the alternative symbols with numbers given in parentheses after the abstract node name. You can use `-' in an abstract node to show that the empty node should be used in this place. If the cost is absent it is believed to be 1. The cost of the terminal, error node, and empty node is always zero.

There is a reserved terminal error which marks the start point of an error recovery. The translation of the terminal is the value of the variable error_anode.
set_debug (level). This function sets up the level of debugging information output to stderr. The higher the level, the more information is output. The default value is 0 (no output). The debugging information includes statistics, the result translation tree, the grammar, parser sets, parser sets with all situations, situations with contexts. The function returns the previously set up debug level. Setting up a negative debug level results in output of the translation for the utility dot of the graphic visualization package graphviz. The parameter should be an integer after an implicit integer conversion. The function will generate the exception partype if it is not true.
set_one_parse (flag). This function sets up a flag whose nonzero value means building only one translation tree (without any alternative nodes). For an unambiguous grammar the flag does not affect the result. The function returns the previously set up flag value. The default value of the flag is 1. The parameter should be an integer after an implicit integer conversion. The function will generate the exception partype if it is not true.
set_lookahead (flag). This function sets up a flag of usage of a look ahead in the parser work. The usage of the lookahead gives the best results with the point of view of the space and speed. The default value is 1 (the lookahead usage). The function returns the previously set up flag. No usage of the lookahead is useful sometimes to get more understandable debug output of the parser work (see the function set_debug). The parameter should be an integer after an implicit integer conversion. The function will generate the exception partype if it is not true.
set_cost (flag). This function sets up building the only translation tree (trees if we set up one_parse_flag to 0) with minimal cost. For an unambiguous grammar the flag does not affect the result. The default value is 0. The function returns the previously set up flag value. The parameter should be an integer after an implicit integer conversion. The function will generate the exception partype if it is not true.
set_recovery (flag). This function sets up a flag whose nonzero value means making error recovery if a syntax error occurred. Otherwise, a syntax error results in finishing parsing (although the syntax error function passed to parse still be called once). The function returns the previously set up flag value. The default value of the flag is 1. The parameter should be an integer after an implicit integer conversion. The function will generate the exception partype if it is not true.
set_recovery_match (n_toks). This function sets up an internal parser parameter meaning how much subsequent tokens should be successfully shifted to finish the error recovery. The default value is 3. The function returns the previously set up value. The parameter should be an integer after an implicit integer conversion. The function will generate the exception partype if it is not true.
parse (tokens, error_func). This function is the major function of the class. It makes the translation according to the previously set up grammar of input given by the parameter tokens whose value should be an array of objects of predeclared class token or of its subtype. If the parser recognizes a syntax error it calls the function given through parameter error_func with six parameters:
- an index of the token (in the array tokens) on which the syntax error occured.
- the error token itself. It may be nil for end of file.
- an index of the first token (in the array tokens) ignored due to error recovery.
- the first ignored token itself. It may be nil for end of file.
- an index of the first token (in the array tokens) which is not ignored after the error recovery.
- the first not ignored token itself. It may be nil for end of file.
If the parser works with switched off error recovery (see the function set_recovery, the third and fifth parameters will be negative and forth and sixth parameter will be nil.

The function returns an object of the predeclared class anode which is the root of the abtsract tree representing the translation of the parser input. The function returns nil only if a syntax error was occurred and the error recovery was switched off. The function can generate the exception partype if the parameter types are wrong or the exception invtoken_decl if any of the input tokens have a wrong code. The function also can generate teh exception pmemory if there is no memory for the internal parser data.

The call of the class parser itself can generate the exception pmemory if there is no memory for the internal parser data.

Class `token`

The space yaep has a predeclared class token. Objects of this class should be the input of the Earley parser (see the function parse in the class parser). The result abstract tree representing the translation will have input tokens as leaves. The class token has one public variable code whose value should be the code of the corresponding terminal described in the grammar. You could extend the class description e.g. by adding variables whose values could be attributes of the token (e.g. a source line number, the name of an identifier, or the value for a number).

Class `anode`

The space yaep has a predeclared class anode whose objects are nodes of the abtract tree representing the translation (see teh function parse of class parser). Objects of this class are generated by Earley parser. The class has two public variables name whose value is a string representing a name of the abstract node as it given in the grammar and transl whose value is an array with abstract node fields as the array elements. There are a few node types which have special meaning:

A terminal node which has the reserved name $term. The value of the public variable transl for this node is an object of the class token representing the corresponding input token which was an element of the array passed as a parameter of the function parse.
An error node which has the reserved name $error. This node exists in one exemplar (see description of the variable error_anode) and represents the translation of the reserved grammar symbol error. The value in the public variable transl will be nil in this case.
An empty node which has the reserved name $nil. This node also exists in one exemplar (see description of the variable nil_anode) and represents the translation of a grammar symbol for which we did not describe a translation. For example, in a grammar rule an abstract node refers for the translation of a nonterminal for which we do not produce a translation. The value in the public variable of such class object will be nil in this case.
An alternative node which has the reserved name $alt. It represents all possible alternatives in the translation of the grammar nonterminal. The value of the public variable transl will be an array with elements whose values are objects of the class anode which represent all possible translations. Such nodes can be generated by the parser only if the grammar is ambiguous and we did not ask it to produce only one translation.

Variables `nil_anode` and `error_anode`

There is only one instance of anode which represents empty (nil) nodes. The same is true for the error nodes. The final variables nil_anode and error_anode correspondingly refer to these nodes.

Example of Earley parser usage.

Let us write a program which transforms an expression into the postfix polish form. Please, read the program comments to understand what the code does. The program should output string "abcda*+*+" which is the postfix polish form of input string "a+b*(c+d*a)".


          expose yaep.*;
          // The following is the expression grammar:
          var grammar = "E : E '+' T   # plus (0 2)\n\
                           | T         # 0\n\
                           | error     # 0\n\
                         T : T '*' F   # mult (0 2)\n\
                           | F         # 0\n\
                         F : 'a'       # 0\n\
                           | 'b'       # 0\n\
                           | 'c'       # 0\n\
                           | 'd'       # 0\n\
                           | '(' E ')' # 1";
          // Create the parser and set up the grammar.
          var p = parser ();
          p.set_grammar (grammar, 1);

          // Add attribute repr to the token:
          class our_token (code) { use token former code; var repr; }
          // The following code forms input tokens from the string:
          var str = "a+b*(c+d*a)";
          var i, inp = [#str : nil];
          for (i = 0; i < #str; i++) {
            inp [i] = our_token (str[i] + 0);
            inp [i].repr = str[i];
          }
          // The following function outputs messages about the syntax errors
          // and the syntax error recovery:
          fun error (err_start, err_tok,
                      start_ignored_num, start_ignored_tok_attr,
                      start_recovered_num, start_recovered_tok) {
            put ("syntax error on token #", err_start,
                 " (" @ err_tok.code @ ")");
            putln (" -- ignore ", start_recovered_num - start_ignored_num,
                   " tokens starting with token #", start_ignored_num);
          }

          var root = p.parse (inp, error); // parse

          // Output the translation in the polish inverse form
          fun pr (r) {
            var i, n = r.name;

            if (n == "$term")
              put (r.transl.repr);
            else if (n == "mult" || n == "plus") {
              for (i = 0; i < #r.transl; i++)
                pr (r.transl [i]);
              put (n == "mult" ? "*" : "+");
            }
            else if (n != "$error") {
              putln ("internal error");
              exit (1);
            }
          }

          pr (root);
          putln ();

Next Previous Contents

9. Predeclared identifiers

9.1 Space lang

Predeclared variables

Arguments and environment

Versions

Threads

Exception classes

Functions of the space lang

9.2 Space io

Exception classes of the space io

Class file

Files

Functions for work with files

File output functions

File input functions

Encoding functions

Functions for work with directories

Functions for access to file/directory information

Miscellaneous functions

9.3 Space sys

Exceptions in space sys

Variable time_format

Time functions

Functions for access to information about OS processes

Function system (command)

9.4 Space re

Exception class invregex

Variable split_regex

Pattern matching

9.5 Space math

Mathematical functions

Other space math functions

9.6 Space yaep

Exception classes of space yaep

Class parser

Class token

Class anode

Variables nil_anode and error_anode

Example of Earley parser usage.

9.1 Space `lang`

Functions of the space `lang`

9.2 Space `io`

Exception classes of the space `io`

Class `file`

9.3 Space `sys`

Exceptions in space `sys`

Variable `time_format`

Function `system (command)`

9.4 Space `re`

Exception class `invregex`

Variable `split_regex`

9.5 Space `math`

Other space `math` functions

9.6 Space `yaep`

Exception classes of space `yaep`

Class `token`

Class `anode`

Variables `nil_anode` and `error_anode`