The Programming Language DINO: Vocabulary and Representation Next Previous Contents

3. Vocabulary and Representation

Wherever it is possible, we use also the EBNF for description of lexical symbols through ASCII set characters. Otherwise, we will use natural language sentences in < and >.

Lexical symbols are identifiers, numbers, character constants, strings, operators, delimiters, and comments. White characters (blanks and line breaks) must not occur within the symbols (except in comments, and as the blanks in strings). The white characters are ignored unless they are essential to separate two consecutive lexical symbols. Upper- and lower-case letters are considered to be distinct.

  1. An identifier is a sequence of letters and digits starting with a letter. The underline is believed to be a valid letter in an identifier. A single underline is fixed for other usage (see a wildcard in section "Patterns").
              Ident = Letter {Letter | Digit}
    
              Letter = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j"
                     | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t"
                     | "u" | "v" | "w" | "x" | "y" | "z"
                     | "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J"
                     | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T"
                     | "U" | "V" | "W" | "X" | "Y" | "Z"
                     | "_"
    
              OctalDigit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
    
              Digit = OctalDigit | "8" | "9"
    
              HexDigit = Digit | "a" | "A" | "b" | "B" | "c" | "C"
                       | "d" | "D" | "e" | "E" | "f" | "F"
                        
    

    Examples:

              line  line2  next_line  NextLine
    

  2. Numbers are (unsigned) integer or floating point numbers. Numbers start with a digit. Numbers starting with the prefix 0x or 0X are hexadecimal integer numbers. Otherwise, integer numbers starting with 0 are octal integer numbers. Octal integer numbers should not contain 8 or 9. If an integer number has the suffix l or L, it is a long number (see long values in the section "Types and Values"). Floating point numbers are distinguished from decimal integer numbers by the presence of the decimal point . or an exponent in the number representation. You can put _ in number digit sequences unless it is the first symbol of the sequence. It can be useful for readability of long digit sequences.
              Number = Integer | Long | FloatingPointNumber
    
              DigitSeq = Digit { Digit | "_" }
    
              HexDigitSeq = HexDigit { HexDigit "_" }
              
              Integer = DigitSeq | "0" ("x" | "X") HexDigitSeq
    
              Long = Integer ("l" | "L")
    
              FloatingPointNumber = DigitSeq "." [ DigitSeq ] [Exponent]
                                  | DigitSeq [Exponent]
    
              Exponent = ("e" | "E") [ "+" | "-" ] DigitSeq
    

    Examples:

              10
              10L
              222_222_222_222_222_222_222_222_222_222_222_222_222_222_222_222l
              100.
              1e2
              1000.000_1E+0
              1___000__000_000
              0xafad_1f34_17ff_
    

  3. A Dino character constant denotes an Unicode character. The following sequences starting with the ASCII backslash have a special meaning inside a Dino character constant:
    • \a - ASCII character alert
    • \b - ASCII character backspace
    • \f - ASCII character form feed
    • \n - ASCII character new line
    • \r - ASCII character carriage return
    • \t - ASCII character horizontal tab
    • \v - ASCII character vertical tab
    • \code - A character with the code given by up to tree octal digits
    • \xcode - A character with the code given by two hexdecimal digits
    • \ucode - A character with the code given by four hexdecimal digits
    • \Ucode - A character with the code given by eight hexdecimal digits
    • \char - The character char for all remaining characters

    To denote a single quote mark use the sequence \'. The double quote mark can be represented either by \" or simply by ". To represent a backslash inside the character constant, use two consecutive ASCII backslashes.

              Character = "'" Char "'"
    
              Char = <any character except for the single quote ',
                      backslash \, or line break>
                   | SimpleEscapeSeq
                   | OctalEscapeSeq
    
              SimpleEscapeSeq = <one of  \'  \"  \\  \a  \b  \f  \n  \r  \t  \v>
    
              OctalEscapeSeq = "\" OctalDigit [ OctalDigit [ OctalDigit ] ]
    

    Examples:

              'a'  '\''  '\\'  '\12'  '"'
    

  4. A string is a sequence of characters enclosed in double quotes. There are the same sequences of characters with special meaning as in a character constant. To denote a double quote mark use the sequence \". The single quote mark can be represented either by \' or simply by '. To represent a backslash inside the character constant, use two consecutive ASCII backslashes.
              String = '"' {Char} '"'
    

    Examples:

              "This is Dino"  "Don't worry\n"
    

    Another variant of a string representation uses back-quotes `. A character inside the back-quotes are present in the string as it is, in other words, the escape sequences do not work in such representation. To denote the back-quote in such string representation use double back-qoutes. The newline may not be in such string representation. It means that a string with back-quotes can reside only on one program line.

              String = '`' {Char} '`'
    

    Examples:

              `\p{Greek}+`  `back qoute `` is here`
    

  5. A C code is a sequence of characters enclosed in the special brackets %{ and %}. It represents a C code fragment which is compiled and loaded into Dino interpreter during the program execution
              C_CODE = "%{" <any char sequence not containing pair %}> "%}"
    

    An example:

              %{ static val_t dino_var; %}
    

  6. The remaining essential lexical symbols are called operators and delimiters. Operators are used for forming expressions, delimiters are used for forming syntax constructions. There is a special kind of operators and delimiters which look like identifiers containing only lower-case letters. They are reserved identifiers (keywords). Keywords can not be used in the place of an identifier.
              OperatorOrDelimeter = "?" | ":" | "|" | "||" | "&" | "&&" | "^"
                                  | "==" | "!=" | "===" | "!==" | "<" | ">"
                                  | "<=" | ">=" | "<<" | ">>" | ">>>" | "@"
                                  | "+" | "-" | "/" | "*" | "%" | "!" | "~"
                                  | "#" | ".+" | ".*" | ".&" | ".^" | ".|"
                                  | "(" | ")" | "[" | "]" | "{" | "}"
                                  | "." | "," | ";" | "=" | "*=" | "/="
                                  | "%=" | "+=" | "-=" | "@=" | "<<=" | ">>="
                                  | ">>>=" | "&=" | "^=" | "|=" | "++" | "--"
                                  | "..." | Keyword
    
              Keyword = "_" | "break" | "case" | "catch" | "char" | "class"
                      | "continue" | "else" | "expose" | "extern" | "final"
                      | "fiber" | "float" | "for" | "former" | "friend" | "fun"
                      | "hide" | "hideblock" | "if" | "in" | "int"
                      | "later" | "long" | "new" | "nil" | "obj"
                      | "pmatch" | "priv" | "pub" | "return" | "rmatch" 
                      | "tab" | "thread" | "this" | "throw" | "try" | "type"
                      | "use" | "val" | "var" | "vec" | "wait"
    
  7. Comments are considered analogous to blanks on the syntax level of the program. There are two types of the comments. The first type is an arbitrary character sequence starting with /* and finishing with */. The second type comment starts with // and finishes with the first line break or with the end of file.
              Comment = "/*" <arbitrary char. sequence not containing pair */> "*/"
                      | "//" <arbitrary char. sequence finishing on line break>
    


Next Previous Contents