Dino has quite a lot of predeclared identifiers. They are combined in
in a few signleton objects also called spaces -- see the
section Declarations and Scope Rules. Most of predeclared
identifiers refer for functions. The predeclared functions expect a
given number of actual parameters (may be a variable number of
parameters). If the actual parameter number is an unexpected one, the
exception parnumber
is generated. The predeclared functions
expect that the actual parameters (may be after implicit conversions)
are of the required type. If this is not true, the
exception partype
is generated. To show how many parameters
the function requires, we will write the names of the parameters and
use the brackets [
and ]
for the optional parameters
in the description of the functions.
Examples: The following description
strtime ([format [, time]])
describes that the function can accept zero, one, or two
parameters. If only one parameter is given, then this is
parameter format
.
If nothing is said about the returned result, the function return value is undefined.
The predeclared identifiers are describe below according to their spaces.
lang
The space contains fundamental Dino declarations. All declarations of the space are always exposed.
Space lang
has some predeclared variables which contain
useful information or can be used to control the behaviour of the Dino
interpreter.
To access arguments to the program and the environment, the following variables can be used:
argv
. The variable value is an immutable vector
whose elements are strings (immutable vectors of characters)
representing the arguments to the program (see the
appendix Implementation).env
. The variable value is an immutable table
whose elements are strings (immutable vectors of characters)
representing values of the environment variables whose names
are the keys of the table.As Dino is a live programming language, it and its interpreter are in
the process of development. To access the Dino interpreter's version
number and the language version, the final variables
version
and lang_version
can be used
correspondingly. The variable values are the versions as floating
point numbers. For example, if the current Dino interpreter version
is 0.97 and the Dino language version is 0.5, the variable values will
be 0.97 and 0.5.
To access some information about threads in Dino program, the following variables can be used.
main_thread
. The variable value is the main
thread. When the program starts, there is only one thread
which is called the main thread.curr_thread
. The variable value is the thread in
which you reference for the variable.All these variables are final, so you can not change their values.
All predeclared classes in the space lang
describe exceptions
which may be generated in a Dino program. All Dino exceptions are
represented by objects of the predeclared class except
or of
a sub-class of the class except
. The class except
has no parameters. There is only one predeclared sub-class
error
of the class except
. All classes
corresponding to user-defined exceptions are suggested to be declared
as a sub-class of except
. All other exceptions
(e.g. generated by the Dino interpreter itself or by predeclared
functions) are objects of the class error
or predeclared
classes which are sub-classes of error
. The class
error
and all its sub-classes has one parameter
msg
which contains a readable message about the exception.
The following classes are declared in the space lang
as a
sub-class of error
:
invop
. The following sub-classes of this class
describe exceptions when operands of an operation have an
incorrect type or value.
optype
. This class describes that an operand
of an operation is not of the required type (possibly after
implicit conversions).opvalue
. This class is reserved for the error
of that an operand of an operation has invalid value.invindex
. Sub-classes of this class describe
exceptions in referring for a vector element.
indextype
. This class describes that the
index is not of integer type (possibly after implicit
integer conversion).indexvalue
. This class describes that the
index is negative or equal to or more than the vector
length.indexop
. This class describes that the first
operand in referring to a vector element is not a vector.invslice
. Sub-classes of this class describe
exceptions in referring for a vector slice.
slicetype
. This class describes that the
start index, bound, or step is not of integer type
(possibly after implicit integer conversion).sliceform
. This class describes that the
slice has a wrong form, e.g. the start index is negative,
the step is zero or the slice is applied not to a vector.invector
. Sub-classes inside this class mostly
describe exceptions in slice operations.
veclen
. This class describes that operands in
a slice operator have different length.vecform
. This class describes that operands
in a slice operator have different dimensions.matrixform
. This class describes error when a
matrix transposition (function transpose
) is
applied to a vector of different length vectors.invkey
. Sub-classes inside this class describe
exceptions in referring to a table element.
keyvalue
. This class describes that there is
no such element in the table with the given key when we
need the value of the element. The exception does not
occur when a table element reference stands in the left
hand side of an assignment-statement.keyop
. This class describes that the first
operand in referring to a table element is not a table.invcall
. Sub-classes of this class describe
exceptions in calling functions (mainly predeclared ones).
abstrcall
. This class describes that we try
to call a declared but not defined function.callop
. This class describes that we try to
call something which is not a function, class, or fiber.
The exception is also generated when we try to create a
class file
instance by calling the class.partype
. This class describes that a
parameter value of a predeclared function is not of
the required type.parvalue
. This class describes that a
parameter value of a predeclared function is not one of
the permitted values (see functions set_encoding
,
set_file_encoding
).parnumber
. This class describes that the
number of actual parameters is not valid when we call a
predeclared function.syncthreadcall
. This class describes that a
fiber call occurs inside a critical region -- see the
wait-statement.invresult
. This class describes that the
result value of a function call is not of the required
type, e.g. the comparison function used in a call of the
function sort
returns a non integer value.internal
. This class describes all other
(nonspecified) exceptions in calling predeclared functions.invaccess
. Sub-classes of this class describe
exceptions in accessing or changing values.
accessop
. This class describes that a given
class declaration can not be found or is private when
accessing to it through the corresponding object.accessvalue
. This class describes that we try
to access to a declared but not defined through the
corresponding object -- see abstract classes.immutable
. This class describes that we try
to change an immutable value.patternmatch
. This class describes that the
pattern in a variable declaration does not match the
assigned value.deadlock
. This class describes that a deadlock is
recognized in a multi-threaded program.syncwait
. This class describes that we try to
execute a wait-stmt inside a critical region.lang
The following functions are declared in the space lang
:
tolower (str)
. The function expects that the
parameter str
(after an implicit string conversion) is
a string. The function returns a new string str
in
which upper case letters are changed to the corresponding lower
case letters.toupper (str)
. The function expects that the
parameter str
(after an implicit string conversion) is
a string. The function returns a new string str
in
which lower case letters are changed to the corresponding upper
case letters.translit (str, what, subst)
. The function
transliterates charactes in a string. The function expects
that the parameters str
(after an implicit string
conversion), what
, and subst
are strings.
The function returns the new string str
in which its
characters which are present in what
are changed to
the corresponding characters in subst
. The last two
strings should have the same length. The second string may
contain more than one occurence of a character. In this case
the last correspondence is taken.eltype (vect)
. The function expects that the
parameter value is a vector. The function returns nil
if the vector is heterogenous, otherwise the function returns
the type of the vector elements (type of nil
if the
vector is empty).keys (tab)
. The function expects that the
parameter value is a table. The function returns a new mutable
vector containing all the keys in the table. The order of keys
in the vector is undefined.closure (par)
. The function accepts any parameter
value. If the parameter value is an object or a block instance
of a function, the function closure
returns the
corresponding class or function which contains also its
context. That is why it is called a closure. In all other
cases, the function returns nil
.context (par)
. The function returns the context
(see the section Declarations and Scope Rules)
represented by a block instance or an object for the given
parameter value which should be a function, a class, a fiber, a
block instance, or an object.inside (par1, par2, flag = 0)
. The goal for the
function usage is to check that something is declared inside
something other.
If the third parameter value after an implicit integer conversion is given and nonzero, it is checked with taking contexts into account. The second parameter value should be a function, class, object, or a block instance. In the last two cases of the second parameter value, the corresponding class, function, or block is used. The first parameter value should be a function, a class, an object, or a block instance. In the last two cases, they define the corresponding function, class, or block.
If the function, class, or block defined by the first parameter
is declared inside the function, class, or block given by the
second parameter, the function inside
returns 1. The
function inside
also returns 1 if the function, class,
or block defined by the first parameter is the same as the
function, class, or block given by the second parameter.
Otherwise the function inside
returns 0. The
following example illustrates the difference between checking
with taking contexts into account and without it.
class c () {
class subc () {
}
}
inside (c ().subc (), c ().subc); // returns 1
inside (c ().subc (), c ().subc, 1); // returns 0
The first call of inside
returns 1, while the second
one returns 0.
isa (fco, fc)
. The goal for function usage is to
check that a function, a class, or an object given by the first
parameter fco
uses declarations (through a use-clause)
of a function or a class given by the second
parameter fc
, in other words the first is a subtype of
the second (or a sub-class of the class). If it is true, the
function returns 1, otherwise it returns zero. If the
parameter types are wrong, the function generates the
exception partype
. The following example illustrates
usage of isa
.
class c () {}
class subc () { use c;}
isa (subc, c);
isa (subc (), c);
The calls of isa
in the example return 1.
subv (vect, index, length = -1)
. The function is
used to extract a sub-vector. The first parameter value should
be a vector after an implicit string conversion. The second
and third parameter values should be integers after an implicit
integer conversion.
The function extracts only an element or the part of the
sub-vector existing in the vector (so you can use any values of
the index and the length). If the index is negative, it refers
to an element anologous to a slice bound. In other
words, -1
corresponds to the vector
length, -2
corresponds to the vector length-1,
-3
corresponds to the vector length-2, and so on. If
the length is negative, the sub-vector will finish on the
vector end. The function returns a new vector which is the
sub-vector. The result vector is immutable only when the
original vector is immutable.
del (vect, index, length = 1) or del (tab, key)
.
The first form of the function is used to remove the vector
element or a sub-vector from the mutable vector
vect
. The second and the third parameter values
should be integers after an implicit integer conversion.
The function removes only an element or the part of the
sub-vector existing in the vector (so you can use any values of
the index and the length). A negative index has the same meaning
as in subv
. If the length is negative, the sub-vector
will finish on the vector end.
The second form of the function is used to remove an element (if it exists) with the given key from a mutable table.
The function generates the exception immutable
if we
are trying to remove from an immutable vector or table. The
function returns the modified vector/table.
ins (vect, el, index = -1)
. The function inserts
an element given by the second parameter into a vector given by
the first parameter on the place given by the third parameter.
The third parameter should be an integer after an implicit
integer conversion. Negative index has the same meaning as
in subv
. The function generates the
exception immutable
if we are trying to insert into an
immutable vector. The function returns the modified vector.insv (vect, vect, index = -1)
. The function is
analogous to the function ins
but it is used for
insertion of all vector elements into the vector given as the
first parameter. So the second parameter value should be a
vector. The function returns the modified vector.rev (vect)
. The function returns a reversion of
the given vector.cmpv (vect, vect)
. The function makes ab implicit
string conversion of the parameter values. After that, the
parameter values should be vectors whose first corresponding
equal elements should have the same type (character, integer,
or floating point type). The first corresponding unequal
elements should have the same type too (the remaining elements
can have different types). As usual, if this is not true, the
exception partype
is generated.
The function returns 1 if the first unequal element value of the first vector is greater than the corresponding element in the second vector, -1 if less, and 0 if the all corresponding vector elements are equal. If the first vector is a prefix of the second vector, the function returns -1. If the second vector is a prefix of the first vector, the function returns 1, so it uses in fact a generalized lexicographical order.
filter (f, v, d = 1)
. The function expects
function f
, vector v
, and optional
integer d
after an integer conversion. Otherwise the
exception partype
is generated.
The function processes v's elements if d
is equal
one, elements of vectors which are v's elements if d
is equal to 2 and so on. In other words, d
is a level
on which the vector elements are processed. If v
has
no structure necessary for processing, the
exception vecform
is generated. If d
is zero
or negative, the function just returns v
. Otherwise
the function creates a new mutable vector having the same
structure as v
with only elements on level d
for which the function f
returns nonzero value after
an integer conversion.
If the result of function f
calls after the integer
conversion is not integer, the exception invresult
is
generated. The following example illustrates an usage
of filter
.
var i, v = [0, 1, -2, 3, -4];
println (filter (fun (a) {a > 0;}, v));
v = [[0, 1, -2, 3, -4], [5, -6, 7, -8, 9]];
println (filter (fun (a) {a > 0;}, v, 2));
map (f, v, d = 1)
. The meaning of the function
parameters and constraints to their values are analogous to
ones of the function filter
. Only the
function f
can return any value. The elements
processed by the function f
are changed onto the
results of function f
calls. The following example
illustrates usage of map
.
var i, v = [[0, 1, -2, 3, -4], [5, -6, 7, -8, 9]];
println (map (fun (a) {a < 0 ? nil : a;}, v, 2));
fold (f, v, init, d = 1)
. The meaning of function
parameters f
, v
, and d
and
constraints to their values are analogous to ones of the
function filter
. The function processes all elements
of the vectors on level d
and returns value
f (f (f (f (init, el0), el1), ...) , eln)
where
el0
, ..., eln
are vector elements on
level d
taken from left to right. If d
is
zero or negative or the vectors are empty, the function
returns init
. The following example illustrates usage
of fold
.
var v = [1,2,3,4];
println (fold (fun (a, b) {a + b;}, v, 0));
sort (vect[, compare_function])
. The function
returns a new sorted vector. The original vector given as the
first parameter value should be a homogeneous vector whose
elements are of character, integer, long integer, or floating
point type. If the second parameter is not given, the standard
arithmetic order (see the comparison operators) is used. To
use a special ordering, use the second parameter which should be
a function which compares two elements of the vector and
returns a negative integer if the first parameter value
(element) is less than the second one, a positive integer if
the first parameter value is greater than the second one, and
zero if they are equal.transpose (m)
. The function expects
matrix m
. It means that m
should be a vector
(each element is a matrix row) of vectors of equal length.
If m
is not a vector, the exception partype
is generated. If the elements of m
are not vectors of
the same length, the exceptions matrixform
is
generated. The function returns a new mutable vector of
mutable vectors which is a matrix transposition of m
.gc ()
. The function forces a garbage collection
and heap compaction. Usually the Dino interpreter itself
invokes a garbage collection when it believes that it needs to
this.exit (code)
. The function finishes the work of
the interpreter with the given code which should be an integer
value after an implicit integer conversion.io
The space contains functions for input and output and for work with files and directories. All declarations of the space are always exposed.
io
The following classes are declared in the space io
as
sub-classes of invcall
:
invinput
. This class describes that the file
input is not of the required format. Usually the exception is
generated by the function scan
etc.invfmt
. This class describes that a format of a
format output function is wrong (see the
function putf
).eof
. This class describes that the end of file is
encountered. Usually the exception is generated by functions
reading files (get
, scan
etc).invencoding
. This class describes different
exceptions with the used encodings, e.g. a file contains bytes
not corresponding to the expected encoding or in some cases the
encoding should contain ASCII characters.file
Dino has a predeclared final class file
. Work with files in
a Dino program are made through objects of the class. All
declarations inside of the class are private. The objects of the
class can be created only by the predeclared functions open
or popen
. If you create an object of the class by calling
the class, the exception callop
will be generated. The file
encoding is defined by the current DINO encoding at the file creation
time (see thefunctions set_encoding
,
set_file_encoding
). If you want to work with files on the
byte level without any encoding/decoding, you can use an encoding
called "RAW"
.
To output something into the standard output streams or to input something from the standard input stream, the following variables can be used:
stdin
. The variable value is an object of the
class file
which corresponds to the standard input
stream.stdout
. The variable value is an object of the
class file
which corresponds to the standard output
stream.stderr
. The variable value is an object of the
class file
which corresponds to the standard error
stream.All these variables are final, so you can not change their values.
Encoding of the files is DINO current encoding at the program start
(see the function set_encoding
).
The following functions (besides the input/output functions) work with
OS files. The functions may generate an exception declared in the
class syserror
(e.g. eaccess
, enametoolong
,
eisdir
and so on) besides the standard partype
,
and parnumber
. The function rename
can be used for
renaming a directory, not only a file.
rename (old_path, new_path)
. The function renames
the file (directory) given by its path name. The old and new
names are given by the parameter values which should be strings
after an implicit string conversion.remove (file_path)
. The function removes the OS
file given by its path name. The file path name should be a
string after an implicit string conversion.open (file_path, mode)
. The function opens the
file for work in the given mode, creates a new class
file
instance, associates the opened file with the
instance, and returns the instance. The parameter values
should be strings after an implicit string conversions. The first
parameter value is a string representing the file path. The
second parameter value is a string representing the mode for work
with the file (for all possible modes see the ANSI C function
fopen
documentation). All work with the opened file
is made through the file instance.close (fileinstance)
. The function closes a file
opened by the function open
. The file is given by the
class file
instance. The function also removes all
association of the instance with the file.flush (fileinstance)
. The function flushes any
output that has been buffered for the opened file given by
the class file
instance.popen (command, mode)
. The function starts a
shell command given by the first parameter value (which should
be a string after an implicit string conversion), creates a
pipe, creates a new class file
instance, associates
the pipe with the instance, and returns the instance. Writing
to such a pipe (through the class file instance) writes to the
standard input of the command. Conversely, reading from the
pipe reads the command's standard output. After an implicit
string conversion the second parameter value should be the
string "r" (for reading from the pipe) or "w" (for writing to
the pipe). The pipe should be closed by the
function pclose
.pclose (fileinstance)
. The function waits for the
command connected to a pipe to terminate. The pipe is given by
the class file
instance returned by the function
popen
. The function also removes the association of
the instance with the pipe.tell (fileinstance)
. The function returns the
current value of the file position indicator for the file
(opened by function open
) given by the class
file
instance.seek (fileinstance, offset, whence)
. The function
sets up the current file position indicator for the file
(opened by function open
) given by the class
file
instance. The position is given by
offset
which should be an integer after an implicit
arithmetic conversion and whence
which should be a
string after an implicit string conversion. The first character
of the string should be 's'
, 'c'
, or
'e'
(these characters mean that the offset is relative
to the start of the file, the current position indicator, or
the end-of-file, respectively).get_file_encoding (fileinstance)
. The function
returns a new mutable string which is a name of the current file
encoding.set_file_encoding (fileinstance, name)
. The
function accepts a file and a string and changes the current
file encoding. If the name represents an unknown encoding
name, the function generates the exception parvalue
.The following functions are used to output something into opened
files. All the function return values are undefined. The
functions may generate an exception declared in the class syserror
(e.g. eio
, enospc
and so on) besides the standard
partype
and parnumber
.
put (...)
. All parameters should be strings after
an implicit string conversion. The function outputs all
strings into the standard output stream.putln (...)
. The function is analogous to the
function put
except for the fact that it additionally
outputs a new line character after output of all the strings.fput (fileinstance, ...)
. The function is
analogous to the function put
except for the fact that
it outputs the string into an opened file associated with a
class file
instance which is the first parameter
value.fputln (fileinstance, ...)
. The function is
analogous to function fput
except for the fact that it
additionally outputs a new line character after output of all
the strings.
putf (format, ...)
. The first parameter should be
a string after an implicit string conversion. The function
outputs the rest of parameters according to the format. The
number of the rest parameters should be exactly equal to the
conversions (including parameterized widths and precisions) in
the format. Otherwise, the exception parnumber
will
be generated. The types of the parameter should correspond to
the corresponding conversion specifier (or to be an integer for
parameterized widths and precisions). If it is not true, the
exception partype
will be generated. The format is
mostly a subset of one of standard C function printf
but it can also deal with multi-precision integers (of the Dino
type long). The format has the following syntax:
format : <any character except %>
| '%' flags [width] [precision]
conversion_specifier
flags :
| flag
flag : '#' | '0' | '-' | ' ' | '+'
width : '*' | <a decimal number starting with non-zero>
precision : '.' ['*' | <decimal number>]
conversion_specifier : 'd' | 'o' | 'x' | 'X'
| 'e' | 'E' | 'f' | 'g'
| 'G' | 'c' | 's' | '%'
If the format syntax is wrong, the exception invfmt
is
generated.
The flag '#' means that the value should be converted into an alternative form. It can be present only for the conversion specifiers 'o', 'x', 'X', 'e', 'E', 'f', 'g', and 'G'. If the flag is used for the conversion specifier 'o', the output will be prefixed by '0'. For 'x' and 'X' the output will be prefixed by '0x' and '0X' correspondingly. For the conversions 'e', 'E', 'f', 'g', and 'G' the output will always contain a decimal point. For the conversions 'g' and 'G' it also means that trailing zeros are not removed from the output as they would be without the flag. The following code using the flag '#' in a format
putf ("->%#o %#x %#x %#.0e %#.0f %#g<-\n",
8, 10, 16l, 2., 3., 4.);
will output
->010 0xa 0x10 2.e+00 3. 4.00000<-
The flag '0' means that the output value will be zero padded on the left. If both flags '0' and '-' appear, the flag '0' is ignored. It is also ignored for the conversions 'd', 'o', 'x', and 'X' if a precision is given. The flag is prohibited for the conversions 'c' and 's'. The following code using the flag '0' in a format
putf ("->%04d %04x %04x %09.2e %05.2f %05.2g<-\n",
8, 10, 16l, 2., 3., 4.);
will output
->0008 000a 0010 02.00e+00 03.00 00004<-
The flag '-' means that the output will be left adjusted on the field boundary. (The default is a justification to the right). The flag '-' overrides the flag '0' if the both are given. The following code using the flag '-' in a format
putf ("->%-04d %-04x %-04x %-09.2e %-05.2f %-05.2g<-\n",
8, 10, 16l, 2., 3., 4.);
will output
->8 a 10 2.00e+00 3.00 4 <-
The flag ' ' means that the output of a signed number will start with a blank for positives number. The flag can be used only for the conversions 'd', 'e', 'E', 'f', 'g', and 'G'. If both flags ' ' and '+' appear, the flag ' ' is ignored. The following code using the flag ' ' in a format
putf ("->% d % d % .2e % .2f % .2g<-\n",
8, 16l, 2., 3., 4.);
will output
-> 8 16 2.00e+00 3.00 4<-
The flag '+' means that the output of a signed number will start with a plus for a positives number. The flag can be used only for the conversions 'd', 'e', 'E', 'f', 'g', and 'G'. The flag '+' overrides the flag ' ' if both are given. The following code using the flag '+' in a format
putf ("->%+d %+d %+.2e %+.2f %+.2g<-\n",
8, 16l, 2., 3., 4.);
will output
->+8 +16 +2.00e+00 +3.00 +4<-
The width defines a minimum width of the output value. If the
output is smaller, it is padded with spaces (or zeros -- see
the flag '0') on the left (if the flag '-' is used) or on the
right. The output is never truncated. The width should be no
more than maximal integer value, otherwise teh
exception invfmt
is generated. The width can be given
as a parameter of the integer type if '*' is used. If the
value of the width given by the parameter is negative, the flag
'-' is believed to be given and the width is believed to be
equal to zero. The following code using the width in a format
putf ("->%5d %05d %-5d %5d %*d %*d<-\n",
8, 9, 10, 16l, 5, 8, -5, 10);
will output
-> 8 00009 10 16 8 10 <-
The precision is prohibited for the conversion 'c'. If the number after the period is absent, its value will be zero. The precision can be given as a parameter of the integer type if '*' is used after the period. If the value of precision given by the parameter is negative, its value is believed to be zero too. For the conversions 'd', 'o', 'x', and 'X' the precision means a minimum number of the output digits. For the conversions 'e', 'E', and 'f' it means the number of the digits to appear after the decimal point. For 'g' and 'G' it means the maximum number of significant digits. For 's' it means the maximum number of characters to be output from a string. The following code using precisions in a format
putf ("->%.d %.0d %.5d %.d %.0f %.0e %.2g<-\n",
8, 8, 9, 16l, 2.3, 2.3, 3.53);
putf ("->%.2s %.0d %.*d %.*d %.*d<-\n",
"long", 0, 5, 8, -5, 8, 5, 16l);
will output
->8 8 00009 16 2 2e+00 3.5<-
->lo 00008 8 00016<-
The conversion 'd' should be used to output integer or long integer. The default precision is 1. When 0 is output with an explicit precision equal to 0, the output is empty.
The conversions 'o', 'x', and 'X' should be used to output an
integer or long integer value as an unsigned in the octal and
hexadecimal form. The lower case letters abcdef
are
used for 'x' and the upper case letters ABCDEF
are
used for 'X'. The precision gives the minimum number of digits
that must appear. If the output value requires fewer digits, it
is padded on the left with zeros. The default precision is 1.
When 0 is output with an explicit precision equal to 0, the
output is empty.
The conversion 'f' should be used to output floating point values.
The output value has a form [-]ddd.ddd
where the
number of digits after the decimal point is given by the
precision specification. The default precision value is 6. If
the precision is explicitly zero, no decimal-point character
appears.
The conversions 'e' and 'E' should be used to output floating point
values with an exponent in the form [-]d.ddd[e|E][+|-]dd
.
There is always one digit before the decimal-point. The number
of digits after the decimal point is defined by the precision.
The default precision value is 6. If the precision is zero, no
decimal-point appears. The conversion 'E' uses the letter
E
(rather than e
) to introduce the exponent.
The exponent always contains at least two digits. If the
exponent value is zero, the exponent is output as 00
.
The conversions 'g' and 'G' should be used to output floating point values in the style 'f' or 'e' (or 'E' for conversion 'G'). The precision defines the number of significant digits. The default value of the precision is 6. If the precision is zero, it is treated as 1. The conversion 'e' is used if the exponent from the conversion is less than -4 or not less than the precision. Trailing zeros are removed from the fractional part of the output. If all fractional part is zero, the decimal point is removed too.
The conversion 'c' should be used to output a character value.
The conversion 's' should be used to output strings.
The conversion '%' should be used to output %
.
The following code using different conversions in a format
putf ("->%% %c %s %d %o %x %X %d %o %x %X<-\n",
'c', "string", 7, 8, 20, 20, 8l, 9l, 21l, 21l);
putf ("->%f<-\n", 1.5);
putf ("->%e %E %g %G %g %G<-\n",
2.8, 2.8, 3.7, 3.7, 455555555.555, 5.9e-5);
will output
->% c string 7 10 14 14 8 11 15 15<-
->1.500000<-
->2.800000e+00 2.800000E+00 3.7 3.7 4.55556e+08 5.9E-05<-
fput (fileinstance, format, ...)
. The function is
analogous to the function putf
except for the fact
that it outputs the operands into an opened file associated
with a class file
instance which is the first
parameter value.print (...)
. The function outputs all parameter
values into the standard output stream. The function never
makes an implicit conversions of the parameter values. The
parameter values are output as they could be represented in
Dino itself (e.g. character 'c'
is output
as 'c'
, vector ['a', 'b', 'c']
is output
as "abc"
, vector
[10, 20]
as [10, 20]
and so on). As you know
some values (functions, classes, block instances, class
instances, threads) are not represented fully in DINO. Such
values are represented schematically. For example, the output
fun f {}.g(unique_number)
would mean the function
f
in the call of function (or class) g
with
the given unique number and the function g is in the instance of
the implicit block covering the whole program. For the
function g
, output would look simply like fun
g
because there is only one instance of the implicit block
covering the whole program. Output for an instance of the
class c
in the function f
looks like
instance {}.f(unique_number).c(unique_number)
. Output
for a block instance of the function f
looks like
stack {}.f(unique_number)
. Output for a thread whose
fiber t
is declared in the function
f
would look like thread unique_number
{}.f(unique_number).t(unique_number)
.println (...)
. The function is analogous to the
function print
except for the fact that it
additionally outputs a new line character after output of all
parameters.fprint (fileinstance, ...)
. The function is
analogous to the function print
except for the fact
that it outputs the parameters into an opened file associated
with a class file
instance which is the value of first
parameter.fprintln (fileinstance, ...)
. The function is
analogous to function fprint
except for the fact that
it additionally outputs a new line character after the output
of all the parameters.The following functions are used to input something from opened files.
The functions may generate an exception declared in the
class syserror
(e.g. eio
, enospc
and so on)
or eof
besides the standard partype
,
and parnumber
.
get ()
. The function reads one character from the
standard input stream and returns it. The function generates
the exception eof
if the function tries to read the
end of file.getln ()
. The function reads one line from the
standard input stream and returns it as a new string. The end
of line is the newline character or end of file. The returned
string does not contain the newline character. The function
generates the exception eof
only when the file
position indicator before the function call stands exactly at
the end of file.getf ([ln_flag])
. The function reads the whole
standard input stream and returns it as a new string. The
function generates the exception eof
only when the
file position indicator before the function call stands exactly
at the end of file. The function has an optional parameter
which should be integer after an implicit integer conversion.
If the parameter value is nonzero, the function returns a
vector of strings. Otherwise it behaves as usually. Each
string is a line in the input stream. The strings do not
contain the newline characters.fget (fileinstance)
. The function is analogous to
the function get
except for the fact that it reads
from an opened file associated with the class file
instance which is the parameter's value.fgetln (fileinstance)
. The function is analogous
to the function getln
except for the fact that it
reads from an opened file associated with a class file
instance which is the parameter value.fgetf (fileinstance [, ln_flag])
. The function is
analogous to the function getf
except for the fact
that it reads from an opened file associated with a class
file
instance which is the parameter's value.scan ()
. The functions reads a character,
integer, floating point number, string, vector, or table and
returns it as the result. The input values should be
represented in the file as the ones in the Dino language
(except for the fact that there should be no identifiers in the
input values and there should be no operators in the values,
although the signs +
and -
are possible in an
integer or floating point represenation). The table or vector
should contains only values of the types mentioned above. The
values in the file can be separated by white characters. If
there is an error (e.g. unbalanced brackets in a vector value)
in the read value representation the function generates the
exception invinput
. The functions generates the
exception eof
if only white characters are still
unread in the file.scanln ()
. The function is analogous to the
function scan
except for the fact that it skips all
characters until the end of line or the end of file after
reading the value. Skipping is made even if the exception
invinput
is generated.fscan (fileinstance)
. The function is analogous
to the function scan
except for the fact that it reads
from an opened file associated with a class file
instance which is the parameter's value.fscanln (fileinstance)
. The function is analogous
to the function scanln
except for that it reads from
an opened file associated with a class file
instance
which is the parameter value.Dino internally uses Unicode for characters. To provide a communication with the rest of world, it can use different encodings. The default encoding is UTF-8. Dino has two functions to get and change the current encoding:
get_encoding ()
. The function returns a new
mutable string which is a name of the current encoding.set_encoding (name)
. The function accepts a
string and changes the current encoding. If the name
represents an unknown encoding name, the function generates
the exception parvalue
.Examples:
putln (get_encoding ());
set_encoding ("KOI8-R");
The following functions work with directories. The functions may
generate an exception declared in the class syserror
(e.g. eaccess
, enametoolong
, enotdir
and so
on) besides the standard partype
and parnumber
.
readdir (dirpath)
. The function makes an implicit
string conversion of the parameter value which should be a
string (representing a directory path). The function returns a
new mutable vector with elements which are strings representing
the names of all files and sub-directories
(including "."
and ".."
for the current and
parent directory respectively) in given directory.mkdir (dirpath)
. The function creates a directory
with the given name represented by a string (the parameter
value after an implicit string conversion). The directory has
read/write/execute rights for all. You can change it with the
aid of the functions ch*mod
.rmdir (dirpath)
. The function removes the
directory given by a string which is a parameter value after
an implicit string conversion.getcwd ()
. The function returns a new string
representing the full path of the current directory.chdir (dirpath)
. The function makes the directory
given by dirpath
(which should be a string after
an implicit string conversion) the current directory.The following predeclared functions can be used for accessing file or
directory information. The functions may generate an exception
declared in the class syserror
(e.g. eaccess
,
enametoolong
, enfile
and so on) besides the standard
partype
and parnumber
. The functions expect one
parameter which should be a file instance (see the predeclared class
file
) or the path name of a file represented by a string (the
functions make an implicit string conversion of the parameter value).
The single exception to this is isatty
which expects a file
instance.
ftype (fileinstance_or_filename)
. The function
returns one the following characters:
'f'
. A regular file.'d'
. A directory.'L'
. A symbolic link.'c'
. A character device.'b'
. A block device.'p'
. A fifo.'S'
. A socket.Under some OSes the function never returns some of the characters (e.g. 'c' or 'b'). The function may return nil if it can not categorize the file as above.
fuidn (fileinstance_or_filename)
. The function
returns a new string representing a name of the owner of the
file (directory). Under some OSes the function may return the
new string "Unknown"
if there is no notion "owner" in
the OS file system.fgrpn (fileinstance_or_filename)
. Analogous to
the previous function except for it returns a new string
representing a name of the group of the file (directory).
Under some OSes the function may return the new string
"Unknown"
if there is no notion "group" in the OS file
system.fsize (fileinstance_or_filename)
. The function
returns an integer value which is the length of the file in
bytes.fatime (fileinstance_or_filename)
. The function
returns an integer value which is time of the last access to the
file (directory). The time is measured in seconds since the
fixed time (usually since January 1, 1970). See also time
functions.fmtime (fileinstance_or_filename)
. Analogous to
the previous functions but returns the time of the last
modification.fctime (fileinstance_or_filename)
. Analogous to
the previous functions but it returns the time of the last
change. Here `change' usually means changing the file
attributes (owner, modes and so on), while `modification' means
usually changing the file itself.fumode (fileinstance_or_filename)
. The function
returns a new string representing the rights of the owner of
the file (directory). The string may contain the following
characters (in the following order if the string contains more
than one character):
's'
. Sticky bit of the file (directory).'r'
. Right to read.'w'
. Right to write.'x'
. Right to execute.fgmode (fileinstance_or_filename)
. Analogous to
the previous function except for the fact that it returns
information about the file (directory) group user rights and
that the function never returns a string containing the
character 's'
.fomode (fileinstance_or_filename)
. Analogous to
the previous function except for the act that it returns
information about the rights of all other users.isatty (fileinstance)
. The function returns 1 if
the file instance given as the parameter is an open file
connected to a terminal and 0 otherwise.The following functions can be used to change the rights of usage of
the file (directory) for different users. The function expects two
strings (after an implicit string conversion). The first one is the
path name of the file (directory). The second one is the rights. For
instance, if the string contains the character 'r', this is a right to
read (see characters used to denote different rights in the
description of the function fumode
). The function return
values are always undefined.
chumod (path, mode)
. The function sets up rights
for the file (directory) owner according to the given mode.chgmod (path, mode)
. Analogous to the previous
function except for the fact that it sets up rights for the
file (directory) group users and that the function ignores the
character 's'
.chomod (path, mode)
. Analogous to the previous
function except for the fact that it sets up rights for all
other users.There are the following miscellaneous functions in space io
:
sput (...), sputln (...), sputf (format, ...)
The
functions are analogous to the functions put, putln,
print
, and println
but they return the result
string instead of output of the formed string into the standard
output stream.sprint (...), sprintln (...)
. The functions are
analogous to the functions print
and println
but they return the result string instead of output of the
formed string into the standard output stream.sys
This space contains declarations to work with the underlying execution environment (OS) and related exceptions.
sys
The space contains a lot of exceptions:
signal
. This class is a sub-class of the
class error
. Sub-classes of the class signal
describe exceptions from receiving a signal from other OS
processes. They are
sigint
. This class describes the exception
generated by the user's interrupt from the keyboard.sigill
. This class describes the exception
generated by illegal execution of an instruction .sigabrt
. This class describes the exception
generated by the signal abort.sigfpe
. This class describes a floating point
exception.sigterm
. This class describes the exception
generated by the termination signal.sigsegv
. This class describes the exception
generated by an invalid memory reference.invenv
. This class is a sub-class of the
class error
. The class invenv
describes a
corruption of the Dino program environment (see the predeclared
variable env
).syserror
. This class is a sub-class of the
class invcall
. Sub-classes of the
class syserror
describe exceptions in predeclared
functions which call OS system functions. Some exceptions are
never generated but may be generated in the future on some
OSes.
eaccess
. This describes the system error
"Permission denied".eagain
. This describes the system error
"Resource temporarily unavailable".ebadf
. This describes the system error "Bad
file descriptor".ebusy
. This describes the system error
"Resource busy".echild
. This describes the system error "No
child processes".edeadlk
. This describes the system error
"Resource deadlock avoided".edom
. This describes the system error "Domain
error".eexist
. This describes the system error "File
exists".efault
. This describes the system error "Bad
address".efbig
. This describes the system error "File
too large".eintr
. This describes the system error
"Interrupted function call".einval
. This describes the system error
"Invalid argument".eio
. This describes the system error
"Input/output error".eisdir
. This describes the system error "Is a
directory".emfile
. This describes the system error "Too
many open files".emlink
. This describes the system error "Too
many links".enametoolong
. This describes the system error
"Filename too long".enfile
. This describes the system error "Too
many open files in system".enodev
. This describes the system error "No
such device".enoent
. This describes the system error "No
such file or directory".enoexec
. This describes the system error "Exec
format error".enolck
. This describes the system error "No
locks available".enomem
. This describes the system error "Not
enough space".enospc
. This describes the system error "No
space left on device".enosys
. This describes the system error
"Function not implemented".enotdir
. This describes the system error "Not a
directory".enotempty
. This describes the system error
"Directory not empty".enotty
. This describes the system error
"Inappropriate I/O control operation".enxio
. This describes the system error "No such
device or address".eperm
. This describes the system error
"Operation not permitted".epipe
. This describes the system error "Broken
pipe".erange
. This describes the system error "Result
too large".erofs
. This describes the system error
"Read-only file system".espipe
. This describes the system error
"Invalid seek".esrch
. This describes the system error "No such
process".exdev
. This describes the system error
"Improper link".systemcall
. This is a sub-class of the
class invcall
. Sub-classes of the
class systemcall
describe exceptions in calling the
predeclared function system
.
noshell
. This class describes the exception
that the function system
can not find the OS command
interpreter (the shell).systemfail
. This class describes all remaining
exceptions in calling the OS function
system
.invextern
. This is a sub-class of the
class invcall
. Sub-classes of the
class invextern
describe exceptions in calling
external functions or in accessing an external variable.
noextern
. This class describes the exception
that the given external can not be found.libclose
. This class describes the exception
that there is an error in closing a shared library.noexternsupp
. This class describes an exception
in the usage of external objects when they are not
implemented under this OS.compile
. This class describes an exception
in a compilation of C code or loading the result shared
object file.invenvar
. This is a sub-class of the
class invcall
. The class invenvar
describes
corruption in the type of variables split_regex
and
time_format
(e.g. their values are not strings).time_format
The variable value is a string which is the output format of time used
by the function strtime
when it is called without parameters.
The initial value of the variable is the string "%a %b %d %H:%M:%S
%Z %Y"
.
The following functions from the space sys
can be used to get
information about real time.
time ()
. The function returns the time in seconds
since the fixed time (usually since January 1, 1970).strtime ([format [, time]])
. The function returns
a string representing the time
(an integer
representing time in seconds since the fixed time) according to
the format
(a string). If the format is not given,
the value of the variable time_format
is used. In
this case if the value of time_format
is corrupted (it
is not a string), the function generates the
exception invenvar
. If the time is not given, the
current time is used. The format is the same as in C library
function strftime
. Here is an extraction from the OS
function documentation. The following format specifiers can be
used in the format:
%a
- the abbreviated weekday name according to
the current locale.%A
- the full weekday name according to the
current locale.%b
- the abbreviated month name according to
the current locale.%B
- the full month name according to the
current locale.%c
- the preferred date and time
representation for the current locale.%d
- the day of the month as a decimal number
(range 01 to 31).%H
- the hour as a decimal number using a
24-hour clock (range 00 to 23).%I
- the hour as a decimal number using a
12-hour clock (range 01 to 12).%j
- the day of the year as a decimal number
(range 001 to 366).%m
- the month as a decimal number (range 01
to 12).%M
- the minute as a decimal number.%p
- either `am' or `pm' according to the given time
value, or the corresponding strings for the current locale.%S
- the second as a decimal number.%U
- the week number of the current year as a
decimal number, starting with the first Sunday as the first
day of the first week.%W
- the week number of the current year as a
decimal number, starting with the first Monday as the first
day of the first week.%w
- the day of the week as a decimal, Sunday
being 0.%x
- the preferred date representation for
the current locale without the time.%X
- the preferred time representation for
the current locale without the date.%y
- the year as a decimal number without a
century (range 00 to 99).%Y
- the year as a decimal number including
the century.%Z
- the time zone or the name or an
abbreviation.%%
- the character '%'.The space sys
contains predeclared functions which are used
to get information about the current OS process (the Dino interpreter
which executes the program). Each OS process has unique identifier
and usually the OS processes are called by a concrete user and group
and are executed on behalf of the concrete user and group (so called
effective identifiers). The following functions return such
information. On some OSes the function may return string "Unknown" as
a name if there are no notions of user and group identifiers.
getpid ()
. The function returns an integer value
which is the process ID of the current OS process.getun ()
. The function returns a new string which
is the user name for the current OS process.geteun ()
. The function returns a new string
which is the effective user name for the current OS process.getgn ()
. The function returns a new string which
is the group name for the current OS process.getegn ()
. The function returns a new string
which is the effective group name for the current OS process.getgroups ()
. The function returns a new vector
of strings (possibly the empty vector) representing
supplementary group names for the current OS process.system (command)
The function executes the command given by a string (the parameter
value) in the OS command interpreter. Besides the standard exceptions
parnumber
and partype
the function may generate the
exceptions noshell
and systemfail
.
re
This space contains declarations which can be useful for working with the regular expressions and for pattern matching -- see also the match-statements.
invregex
This class describes exceptions specific for executing the pmatch-statement and for calling predeclared functions implementing regular expression pattern matching. Although there is only one class for this, the messages which are in the class parameter can be different and explain more details.
split_regex
The variable value is a string which represents a regular expression
which is used by the predeclared function split
when the
second parameter is not given. The initial value of the variable is
the string "[ \t]+"
.
The space re
contains predeclared functions which are used
for pattern matching. The pattern is described
by regular expressions (regex) and actually a small
program describing a string matching. The pattern
has default syntax of ONIGURUMA package for Unicode. It is
hard to describe formally the pattern syntax. Here is an incomplete
strict description. For the full reference, please see OINGURUMA
package documentation. The regular expressions have the following
syntax:
Regex = Branch {"|" Branch}
The regex matches anything that matches one of the branches.
Branch = {Piece}
The branch matches the first piece, followed by the second piece, etc. If the pieces are omitted, the branch matches the null string.
Piece = Anchor | Unit
Unit = Atom
| Unit Quantifier
Quantifier = Greedy
| Reluctant
| Possesive
Greedy = "?" // 0 or 1 times
| "*" // 0 or more times
| "+" // 1 or more times
| Bound
Bound = "{" Min "," Max "}" // from Min to Max times
| "{" Min "," "}" // at least Min times
| "{" "," Max "}" // equivalent to {0, Max}
| "{" Min "}" // given number times
Reluctant = "??"
| "*?"
| "+?"
| Bound "?"
Possesive : "?+"
| "*+"
| "++"
Min = <unsigned integer>
Max = <unsigned integer>
The unit followed by *
matches a sequence of 0 or
more matches of the unit. An unit followed by +
matches a
sequence of 1 or more matches of the unit. An unit followed by
?
matches a sequence of 0 or 1 matches of the unit.
There is a more general construction (a bound) for describing
repetitions of an unit. An unit followed by a bound containing only
one integer Min
matches a sequence of exactly Min
matches of the unit. An unit followed by a bound containing one
integer Min
and a comma matches a sequence of Min
or
more matches of the unit. An unit followed by a bound containing a
comma and one integer Max
matches at most Max
repetitions of the unit. An unit followed by a bound containing two
integers Min
and Max
matches a sequence of
Min
through Max
(inclusive) matches of the unit.
The described above qualifiers are greedy ones. A gready
qualifier first matches as much as possible and can back-track in a
case of the whole regex matching failure to try shorter sequence.
There are reluctant qualifiers too. They have additional
suffix ?
and first they match as little as possible. The
last type of the qualifiers is possesive. Such qualifiers have
additional suffix +
and behave like the corresponding greedy
ones, but they do not back-track.
Examples:
`.?foo` // matches first "xfoo" in "xfooxxxxfoo"
`.*foo` // matches all "xfooxxxxfoo"
`.+foo` // matches all "xfooxxxxfoo"
`.{1,8}foo` // matches all "xfooxxxxfoo"
`.*?foo` // matches first "xfoo" in "xfooxxxxfoo"
`.+?foo` // Ditto
`.{1,8}?foo` // Ditto
`.*+foo` // fail to match in "xfooxxxxfoo"
`.++foo` // fail to match in "xfooxxxxfoo"
Atom = Anchors
| Character
| CharacterType
| CharacterProperty
| CharacterClass
| Group
| BackReference
| SubexpCall
Character = "\t" // horizontal tab (0x09)
| "\v" // vertical tab (0x0B)
| "\n" // newline tab (0x0A)
| "\r" // return (0x0D)
| "\f" // form feed (0x0C)
| "\a" // bell (0x07)
| "\e" // escape (0x1B)
| "\" OctalCode // char with given octal code
| "\x" HexCode // char with given hexadecimal code
| <any but special character \ ? * + ^ $ [ ( ) >
| "\" <special character>
OctalCode = <3 octal digits>
HexCode = <2 heaxadecimal digits>
CharacterType = '.' // any character but newline
| "\w" // Unicode Letter, Mark, Number, or
// Connector_Punctuation
| "\W" // opposite to the above
| "\s" // Unicode Line_Separator,
// Paragraph_Separator, or
// Space_Separator
| "\S" // opposite to the above
| "\d" // Unicode decimal number
| "\D" // opposite to the above
| "\h" // hexadecimal digit char [0-9a-fA-F]
| "\H" // opposite to the above
CharacterProperty = "\p{" PropertyName "}"
| "\p{^" PropertyName "}"
| "\P{" PropertyName "}"
PropertyName = "Alnum" | "Alpha" | "Blank" | "Cntrl"
| "Digit" | "Graph" | "Lower" | "Print"
| "Punct" | "Space" | "Upper" | "XDigit"
| "Word" | "ASCII"
| "Any" | "Assigned" | "C" | "Cc" | "Cf"
| "Cn" | "Co" | "Cs" | "L" | "Ll" | "Lm"
| "Lo" | "Lt" | "Lu" | "M" | "Mc" | "Me"
| "Mn" | "N" | "Nd" | "Nl" | "No" | "P"
| "Pc" | "Pd" | "Pe" | "Pf" | "Pi" | "Po"
| "Ps" | "S" | "Sc" | "Sk" | "Sm" | "So"
| "Z" | "Zl" | "Zp" | "Zs" | "Arabic"
| "Armenian" | "Bengali" | "Bopomofo"
| "Braille" | "Buginese" | "Buhid"
| "Canadian_Aboriginal" | "Cherokee"
| "Common" | "Coptic" | "Cypriot"
| "Cyrillic" | "Deseret" | "Devanagari"
| "Ethiopic" | "Georgian" | "Glagolitic"
| "Gothic" | "Greek" | "Gujarati"
| "Gurmukhi" | "Han" | "Hangul" | "Hanunoo"
| "Hebrew" | "Hiragana" | "Inherited"
| "Kannada" | "Katakana" | "Kharoshthi"
| "Khmer" | "Lao" | "Latin" | "Limbu"
| "Linear_B" | "Malayalam" | "Mongolian"
| "Myanmar" | "New_Tai_Lue" | "Ogham"
| "Old_Italic" | "Old_Persian" | "Oriya"
| "Osmanya" | "Runic" | "Shavian" | "Sinhala"
| "Syloti_Nagri" | "Syriac" | "Tagalog"
| "Tagbanwa" | "Tai_Le" | "Tamil" | "Telugu"
| "Thaana" | "Thai" | "Tibetan" | "Tifinagh"
| "Ugaritic" | "Yi"
Anchors = "^" // beginning of the line
| "$" // end of the line
| "\b" // word boundary
| "\B" // not word boundary
| "\A" // beginning of string
| "\Z" // end of string, or before newline
// at the end
| "\z" // end of string
The atom can be a character. Some characters has a special meaning in
regex (see comments in the character syntax). The rest characters
match the same character in the matching string. To match a special
character, use \
before the character. Some characters can
be represented by a sequence starting with \
(see the syntax
comments).
Examples:
`\t` // matches "\\t"
`\x65` // matches "e"
`\p{Alpha}` // matches "a"
`\w` // matches "a"
`b$` // matches "b" in "b\na"
The atom can be an anchor. Matching anchors succeeds only if their positions correspond a specific place at the matching string (see comments in the anchor syntax).
Examples:
`b$` // matches "b" in "b\na"
`abc\Z` // matches "abc" in "abc"
`abc\Z` // matches "abc" in "abc\n"
The atom which is a character type matches a specific class of character (see comments in the character type syntax).
The atom which is a character property matches a specific class of
characters. For meaning Alnum
- ASCII
, please see
the corresponding BracketClass
. For meaning C
- Zs
, please see Unicode categories. For
meaning Armenian
- Yi
, please see the Unicode
scripts (alphabets). If the property contains p
with ^
or P
, the match succeeds when the matching
character is not of the class.
Examples:
`\p{Alpha}` // matches "a"
`\p{ASCII}` // matches ";"
CharacterClass = "[" Intersections "]"
| "[^" Intersections "]"
Intersections = Set
| Intersections "&&" Set
Set = SetElement
| Set SetElement
SetElement = ElementChar ["-" ElementChar]
| "[:" BracketClass ":]"
| "[:^" BracketClass ":]"
| CharacterClass
ElementChar = Character
| "\b" // backspace 0x08
BracketClass = "alnum" // Unicode letter, mark,
// or decimal number
| "alpha" // Unicode letter or mark
| "ascii" // character in range 0 - 0x7f
| "blank" // Unicode space separator
// or \t (0x09)
| "ctrl" // Unicode control, format,
// unassigned, private use,
// or surrogate
| "digit" // Unicode decimal number
| "graph" // not a space class and not an
// Unicode control, unassigned,
// or surrogate
| "lower" // Unicode lower case letter
| "print" // graph or space class
| "punct" // any Unicode punctuation
| "space" // any Unicode separator,
// \t (0x09), \n (0x0A), \v (0x0B),
// \f (0x0C), \r (0x0D),
// or 0x85 (next line)
| "upper" // Unicode upper case letter
| "xdigit" // ascii 0-9, a-f, or a-f
| "word" // Unicode letter, mark, decimal
// number or punctuation connector
The atom can be a bracket expression which is a list of
intersections of character sets separated by &&
and enclosed
in []
. If the character class contains ^
right
after [
, it matches any character which does match the
corresponding character class without ^
. A set is a sequence
of set elements.
The element given by a character denotes the character itself. An
element given by two characters in the list separated by -
is
shorthand for the full range of characters between those two
(inclusive) in the sequence of the unicode codes, e.g. [0-9]
matches any decimal digit. Besides the usual character representation
you can use here also \b
which is a backspace representation.
The element given by a bracket class enclosed in [[::]]
matches a
character from this class (see comments in BracketClass). If
character ^
is present right after [[:
, the match
succeeds if the character is not in this class.
The element can be given by a character class, in other words the character clases can be nested.
If you need to use [
, -
, or ]
as a normal
character in a character class, you can use prefix \
for
this.
Examples:
`[[:alpha:]]` // matches "a"
`[[[:lower:]]&&[^a-x]]` // matches "y" or "z"
The atom can be a group, a regular expression enclosed in ()
.
There are several types of groups:
Group = CapturedGroup
| NonCapturedGroup
| "(?#" <any characters but )> ")" // a comment
| "(?" Options ")"
| Context
Options =
| Options Option
Option = "-" | "i" | "m" | "x"
CapturedGroup = "(" [Regex] ")"
| "(?<" Name ">" [Regex] ")"
Name = <one or more word character>
NonCapturedGroup = "(?" Options ":" [Regex] ")"
| "(?>" [Regex] ")" /* Atomic group */
Context = "(?=" [Regex] ")" // look ahead
| "(?!" [Regex] ")" // negative look ahead
| "(?<=" [Regex] ")" // look behind
| "(?<!" [Regex] ")" // negative look behind
BackReference = "\" Number // back ref. by group
// number
| "\k<" Number ">" // back ref. by group
// number
| "\k<-" Number ">" // back ref. by relative
// group number
| "\k<" Name ">" // back ref. by group name
// back ref. by group name and nest level:
| "\k<" Name "+" | "-" Number ">"
Number = <any integer >= 0>
Some groups are captured groups. It means that you can refer the substrings they match (see the back references) or get the start and the end positions of the matched substrings by calling the Dino regex match functions. A captured group may have a name which can be used in the back references or in the subexp calls.
You can place comments not containing )
in regex
betweeen (?#
and )
.
Options without a regex always matches. They just change how matching
works. The option i
switches on igoring the letter cases
during the match. The pption m
makes .
to match a
newline too. The option x
switches on ignoring the white
spaces as a character atom and permits to add comments starting
with #
and ending at the end of line. The
character -
after the corresponding ?
has an
opposite effect, e.g. it makes a letter case important in matching
again etc.
You can define the options in non captured groups. These options affect only this group. Another form of non-captured group is an atomic group. Once regex in an atomic group mathes something, the matching stays the same during back-tracking.
Examples:
`(?i:ab)` // matches "Ab"
`(?x: a a a)` // matches "aaa"
`(?>.*)c` // can not match "abc"
The atom can be a context. A context match does not advance the current position in a matching string. A look ahead context succeeds if the corresponding regex matches a sub-string starting from the current position. A look behind context succeeds if the corresponding regex matches a sub-string finishing right before the current position. There are negative forms of the context atom. They succeed when the corresponding regex does not match.
Examples:
`(?=bcd)bc` // matches "bc" in "aabcd"
`(?<=aa)bc` // matches "bc" in "aabc"
The atom can be a back reference. It refers to the matched string of the corresponding captured group. The captured groups are counted by their left parantheses starting from one going from left to right. The negative number denotes relative order number, in other words, the order is taken starting from the back reference going from right to left. If the captured group has a name, its matched string can be referenced by its name. If several group has the same name, the name in the back reference corresponds to the last such group. You can add a nest level to the name. If the nest level is zero it is the same as named back reference without nested level. A back reference with non-zero nest level never matches.
Examples:
`(a)\k<1>` // matches "aa"
`(?<p>a)\k<p>` // Ditto
The Atom can be a subexp call:
SubexpCall = "\g<" Name ">"
The subexp call is actually another occurence of the group it refers to. But if the call is in the group it refers, it is a recursive description. Only left recursion is not permitted as this results in never ending recursion.
Examples:
`(?<p>cd)\g<p>` // matches "cdcd"
`(?<p>a|b\g<p>c)` // matches "a", "bac", "bbacc" etc
`(?<p>a|b\g<p>c)` // wrong left recursion.
There are the following pattern matching functions in
the space re
:
match (regex, string)
. The function searches for
matching the regular expression regex
in the
string
. The both parameters should be strings after
their implicit string conversions.
The matching is made according to the standard POSIX 1003.2: The regular expression matches the substring starting earliest in the string. If the regular expression could match more than one substring starting at that point, it matches the longest. Subexpressions also match the longest possible substrings, subject to the constraint that the whole match be as long as possible, with subexpressions starting earlier in the regular expression taking priority over ones starting later. In other words, higher-level subexpressions take priority over their component subexpressions. Match lengths are measured in characters, not the collating elements. A null string is considered longer than no match at all.
If there is no matching, the function returns the
value nil
. Otherwise, the function returns a new
mutable vector of integers. The length of the vector is 2
* (N + 1)
where N
is the number of the captured
groups. The first two elements are the index of the first
character of the substring corresponding to the whole regular
expression and the index of the last character matched plus
one. The subsequent two elements are the index of the first
character of the substring corresponding to the first captured
group in the regular expression and the index of the last
character plus one, and so on. If there is no matching with a
captured group, the corresponding vector elements will have
negative values.
Example: The program
println (re.match (`\n()(a)((a)(a))`, "b\naaab"));
outputs
[1, 5, 2, 2, 2, 3, 3, 5, 3, 4, 4, 5]
gmatch (regex, string[, flag])
. The function
searches for different occurrences of the regular expression
regex
in string
. Both parameters should be
strings after their implicit string conversion. The third
parameter is optional. If it is present, it should be integer
after an implicit integer conversion. If its value is nonzero,
the substrings matched by regex can be overlapped. Otherwise,
the substrings are never overlapped. If the parameter is
absent, the function behaves as its value were zero. The
function returns a new mutable vector of integers. The length
of the vector is 2 * N
where N
is number of
the found occurrences. Pairs of the vector elements correspond
to the occurrences. The first element of the pairs is an index
of the first character of substring corresponding to all
regular expression in the corresponding occurrences and the
second element is an index of the last character plus one. If
there is no one occurrence, the function returns nil
.
Example: The program
println (re.gmatch (`aa`, "aaaaa"));
println (re.gmatch (`aa`, "aaaaa", 1));
outputs
[0, 2, 2, 4]
[0, 2, 1, 3, 2, 4, 3, 5]
sub (regex, string, subst)
. The function searches
for substrings matching the regular expression regex
in string
. All parameters should be string after an
implicit string conversion.
If there is no matching, the function returns the
value nil
. Otherwise, the function returns a new
mutable vector of characters in which the first substring
matched has been changed to the string subst
. Within
the replacement string subst
, the
sequence \n
, where n
is a digit from 1 to 9,
may be used to indicate the text that matched the
n
'th captured group of the regex. The sequence
\0
represents the entire matched text, as does the
character &
.
gsub (regex, string, subst)
. The function is
analogous to the function sub
except for the function
searches for all non-overlapping substrings matched with the
regular expression and returns a new mutable vector of
characters in which all matched substrings have been changed to
the string subst
.split (string [, regex])
. The function splits
string
into non-overlapped substrings separated in the
input string by strings matching the regular expression. All
parameters should be strings after an implicit string
conversion. If the second parameter is omitted the value of
the predeclared variable split_regex
is used instead
of the second parameter value. In this case the function may
generate the exception invenvar
(corrupted value of a
predeclared variable).
The function returns a new mutable vector with the elements which are the separated substrings. If the regular expression is the null string, the function returns a new mutable vector with the elements which are strings each containing one character of string.
Examples: The program
println (re.split ("aaa bbb ccc ddd"));
outputs
["aaa", "bbb", "ccc", "ddd"]
The program
println (re.split ("abcdef", ``));
outputs
["a", "b", "c", "d", "e", "f"]
If the regular expression is incorrect, the functions generate the
exception invregex
with a message explaining the error.
math
The space contains mostly mathematical functions.
The following functions make an implicit arithmetic conversion of the parameters. After the conversions the parameters are expected to be of integer, long integer, or floating point type. The result is always a floating point number.
sqrt (x)
. The function returns the square root of
x
. The function generates the exception edom
if x
is negative.exp (x)
. The function returns e
(the
base of the natural logarithm) raised to the power of
x
.log (x)
. The function returns the natural
logarithm of x
. The function generates the exception
edom
if x
is negative or may generate
erange
if the value is zero.log10 (x)
. The function returns the decimal
logarithm of x
. The function generates the exception
edom
if x
is negative or may generate
erange
if the value is zero.pow (x, y)
. The function returns x
raised to the power of y
. The function generates
exception edom
if x is negative and y is not of
integral value.sin (x)
. The function returns the sine of
x
.cos (x)
. The function returns the cosine of
x
.atan2 (x, y)
. The function returns the arc
tangent of the two variables x
and y
. It is
similar to calculating the arc tangent of y / x
,
except that the signs of both arguments are used to determine
the quadrant of the result.math
functionsThere are the following miscellaneous functions:
max (v1, v2, ...)
. The function searches for and
returns the maximal value in all of its parameters. The
parameters should be of integer, long integer, or floating
point type after an implicit arithmetic conversion. So the
function can return an integer, a long integer, or floating
point number depending on the type of the first maximal value
after the conversion.min (v1, v2, ...)
. The function is analogous to
the previous function, but searches for and returns the minimal
value.srand ([seed])
. The function sets the parameter
value (after an implicit integer conversion) as a seed for a
new sequence of pseudo-random integers to be returned by
rand
. These sequences are repeatable by calling
srand
with the same seed value. If the parameter is
not given, the seed will be the result of calling function
time
.rand ()
. The function returns a pseudo-random
floating point value between 0 and 1. If the function
srand
was not called before, 1 will be used as the
seed value.yaep
This space contains declarations to work with Yet Another Earley Parser (YAEP). YAEP is a very powerful tool to implement language compilers, processors, or translators. The implementation of the Earley parser used in Dino has the following features:
yaep
The space yaep
contains the class invparser
which is
a sub-class of invcall
. The following sub-classes of the
class invparser
describe exceptions specific for the work with
YAEP.
invgrammar
. This class describes an exception
that the Earley parser got a bad grammar, e.g. without rules,
with loops in rules, with nonterminals unachievable from the
axiom, with nonterminals not deriving any terminal string etc.invtoken
. This class describes an exception that
the parser got an input token with unknown (undeclared) code.pmemory
. This class describes an exception that
there is not enough memory for the internal parser data.The space yaep
has the predeclared final class parser
which
implements Earley parser. The following public functions and
variables are declared in the class parser
:
ambiguous_p
. This public variable stores
information about the last parsing. A nonzero variable value
means that during the last parsing on a given inputm the parser
found that the grammar is ambiguous. The parser can find this
even if you asked for only one parser tree (see the function
set_one_parse
).
set_grammar (descr, strict_p)
. This function
tunes the parser to given grammar. The grammar is given by
the string descr
. A nonzero value of the parameter
strict_p
(after an implicit integer conversion) means
more strict grammer checking. In this case, all nonterminals
will be checked on their ability to derive a terminal string
instead of only checking the axiom for this. The function can
generate the exceptions partype
(if the parameters
have wrong types) or invgrammar
if the description is
a bad grammar. The function can also generate the
exception pmemory
if there is no memory for the
internal parser data.
The description is similiar to the YACC one. It has
the following syntax:
file : file terms [';']
| file rule
| terms [';']
| rule
terms : terms IDENTIFIER ['=' NUMBER]
| TERM
rule : IDENTIFIER ':' rhs [';']
rhs : rhs '|' sequence [translation]
| sequence [translation]
sequence :
| sequence IDENTIFIER
| sequence C_CHARACTER_CONSTANT
translation : '#'
| '#' NUMBER
| '#' '-'
| '#' IDENTIFIER [NUMBER] '(' numbers ')'
numbers :
| numbers NUMBER
| numbers '-'
So the description consists of terminal declaration and rule sections.
The terminal declaration section describes the names of terminals and their codes. The terminal code is optional. If it is omitted, the terminal code will the next free code starting with 256. You can declare a terminal several times (the single condition is that its code should be the same).
A character constant present in the rules is a terminal described by default. Its code is always the ASCII code of the character constant.
The rules syntax is the same as YACC rule syntax. The
single difference is an optional translation construction
starting with #
right after each alternative. The
translation part could be a single number which means that the
translation of the alternative will be the translation of the
symbol with the given number (symbol numbers in the alternative
start with 0). Or the translation can be empty or
`-
' which designates the value of the variable
nil_anode
. Or the translation can be an abstract node
with the given name, optional cost, and with the fields whose
values are the translations of the alternative symbols with
numbers given in parentheses after the abstract node name. You
can use `-
' in an abstract node to show that the empty
node should be used in this place. If the cost is absent it is
believed to be 1. The cost of the terminal, error node, and
empty node is always zero.
There is a reserved terminal error
which marks the
start point of an error recovery. The translation of the
terminal is the value of the variable error_anode
.
set_debug (level)
. This function sets up the
level of debugging information output to stderr
. The
higher the level, the more information is output. The default
value is 0 (no output). The debugging information includes
statistics, the result translation tree, the grammar, parser
sets, parser sets with all situations, situations with
contexts. The function returns the previously set up debug
level. Setting up a negative debug level results in output of
the translation for the utility dot of the graphic
visualization package graphviz. The parameter should
be an integer after an implicit integer conversion. The
function will generate the exception partype
if it is
not true.set_one_parse (flag)
. This function sets up a
flag whose nonzero value means building only one translation
tree (without any alternative nodes). For an unambiguous
grammar the flag does not affect the result. The function
returns the previously set up flag value. The default value of
the flag is 1. The parameter should be an integer after an
implicit integer conversion. The function will generate
the exception partype
if it is not true.set_lookahead (flag)
. This function sets up a
flag of usage of a look ahead in the parser work. The usage
of the lookahead gives the best results with the point of view of
the space and speed. The default value is 1 (the lookahead usage).
The function returns the previously set up flag. No usage of
the lookahead is useful sometimes to get more understandable
debug output of the parser work (see the function
set_debug
). The parameter should be an integer after
an implicit integer conversion. The function will generate the
exception partype
if it is not true.set_cost (flag)
. This function sets up building
the only translation tree (trees if we set up one_parse_flag to
0) with minimal cost. For an unambiguous grammar the flag does
not affect the result. The default value is 0. The function
returns the previously set up flag value. The parameter should
be an integer after an implicit integer conversion. The
function will generate the exception partype
if it is
not true.set_recovery (flag)
. This function sets up a flag
whose nonzero value means making error recovery if a syntax
error occurred. Otherwise, a syntax error results in finishing
parsing (although the syntax error function passed to
parse
still be called once). The function returns the
previously set up flag value. The default value of the flag is
1. The parameter should be an integer after an implicit
integer conversion. The function will generate the exception
partype
if it is not true.set_recovery_match (n_toks)
. This function sets
up an internal parser parameter meaning how much subsequent
tokens should be successfully shifted to finish the error
recovery. The default value is 3. The function returns the
previously set up value. The parameter should be an integer
after an implicit integer conversion. The function will
generate the exception partype
if it is not true.
parse (tokens, error_func)
. This function is the
major function of the class. It makes the translation
according to the previously set up grammar of input given by
the parameter tokens
whose value should be an array of
objects of predeclared class token
or of its subtype.
If the parser recognizes a syntax error it calls the function
given through parameter error_func
with six
parameters:
tokens
)
on which the syntax error occured.nil
for
end of file.tokens
) ignored due to error recovery.nil
for end of file.tokens
) which is not ignored after the
error recovery.nil
for end of file.If the parser works with switched off error recovery (see the
function set_recovery
, the third and fifth parameters
will be negative and forth and sixth parameter will be
nil
.
The function returns an object of the predeclared class
anode
which is the root of the abtsract tree
representing the translation of the parser input. The function
returns nil
only if a syntax error was occurred and
the error recovery was switched off. The function can generate
the exception partype
if the parameter types are wrong
or the exception invtoken_decl
if any of the input
tokens have a wrong code. The function also can generate teh
exception pmemory
if there is no memory for the
internal parser data.
The call of the class parser
itself can generate the
exception pmemory
if there is no memory for the internal
parser data.
token
The space yaep
has a predeclared class token
.
Objects of this class should be the input of the Earley parser (see
the function parse
in the class parser
). The result
abstract tree representing the translation will have input tokens as
leaves. The class token
has one public
variable code
whose value should be the code of the
corresponding terminal described in the grammar. You could extend the
class description e.g. by adding variables whose values could be
attributes of the token (e.g. a source line number, the name of an
identifier, or the value for a number).
anode
The space yaep
has a predeclared class anode
whose
objects are nodes of the abtract tree representing the translation
(see teh function parse
of class parser
). Objects
of this class are generated by Earley parser. The class has two
public variables name
whose value is a string representing a
name of the abstract node as it given in the grammar
and transl
whose value is an array with abstract node fields
as the array elements. There are a few node types which have special
meaning:
$term
.
The value of the public variable transl
for this node
is an object of the class token
representing the
corresponding input token which was an element of the array
passed as a parameter of the function parse
.$error
.
This node exists in one exemplar (see description of the
variable error_anode
) and represents the translation
of the reserved grammar symbol error
. The value in
the public variable transl
will be nil
in
this case.$nil
.
This node also exists in one exemplar (see description of the
variable nil_anode
) and represents the translation of
a grammar symbol for which we did not describe a translation.
For example, in a grammar rule an abstract node refers for the
translation of a nonterminal for which we do not produce a
translation. The value in the public variable of such class
object will be nil
in this case.$alt
.
It represents all possible alternatives in the translation of
the grammar nonterminal. The value of the public variable
transl
will be an array with elements whose values are
objects of the class anode
which represent all
possible translations. Such nodes can be generated by the
parser only if the grammar is ambiguous and we did not ask it
to produce only one translation.nil_anode
and error_anode
There is only one instance of anode
which represents empty
(nil) nodes. The same is true for the error nodes. The final
variables nil_anode
and error_anode
correspondingly
refer to these nodes.
Let us write a program which transforms an expression into the postfix
polish form. Please, read the program comments to understand what the
code does. The program should output string "abcda*+*+"
which is the postfix polish form of input string
"a+b*(c+d*a)"
.
expose yaep.*;
// The following is the expression grammar:
var grammar = "E : E '+' T # plus (0 2)\n\
| T # 0\n\
| error # 0\n\
T : T '*' F # mult (0 2)\n\
| F # 0\n\
F : 'a' # 0\n\
| 'b' # 0\n\
| 'c' # 0\n\
| 'd' # 0\n\
| '(' E ')' # 1";
// Create the parser and set up the grammar.
var p = parser ();
p.set_grammar (grammar, 1);
// Add attribute repr to the token:
class our_token (code) { use token former code; var repr; }
// The following code forms input tokens from the string:
var str = "a+b*(c+d*a)";
var i, inp = [#str : nil];
for (i = 0; i < #str; i++) {
inp [i] = our_token (str[i] + 0);
inp [i].repr = str[i];
}
// The following function outputs messages about the syntax errors
// and the syntax error recovery:
fun error (err_start, err_tok,
start_ignored_num, start_ignored_tok_attr,
start_recovered_num, start_recovered_tok) {
put ("syntax error on token #", err_start,
" (" @ err_tok.code @ ")");
putln (" -- ignore ", start_recovered_num - start_ignored_num,
" tokens starting with token #", start_ignored_num);
}
var root = p.parse (inp, error); // parse
// Output the translation in the polish inverse form
fun pr (r) {
var i, n = r.name;
if (n == "$term")
put (r.transl.repr);
else if (n == "mult" || n == "plus") {
for (i = 0; i < #r.transl; i++)
pr (r.transl [i]);
put (n == "mult" ? "*" : "+");
}
else if (n != "$error") {
putln ("internal error");
exit (1);
}
}
pr (root);
putln ();