The style of compilation has six tools or elements:
Lexical analyzer: to decompose source code into tokens.
Syntax analyzer: to merge the tokens into statement, then into program which later verifying the syntax.
Semantic analyzer: to check this is of the source code.
Intermediate code generator: to produce an internal form of a program for example; a=b+c (+, b, c, L1) (=, L1, a)
Code optimizer: to enhance object program for the intended purpose of performance and reliability
Code generator: to changed intermediate representation into concentrate on language.
Assignment
Read this interesting paper and write your own thoughts and opinions on the below matter:
Wallace, D. R. and Fujii, R. U. : Software confirmation and validation: an overview. Software, IEEE, 6(3), 10-17. (1989)
The Syntax
Any language consists of a couple of strings of people. The mixture of the strings makes phrases or statements. The syntax plays the role of placing guidelines to the sentences.
The syntax of a program writing language is the proper execution of its expressions, claims and program units. For instance, the syntax of the Java while affirmation is
while ()
The lowest level of the assertion is when transformed it into small units is named lexemes. The lexemes have lexical description to spell it out the lexemes which is segregated from the syntactic description.
The lexemes are categorized by way of a token.
For example, an identifier is a token that can have lexemes, or circumstances, such as amount and total. In some instances, a token has only a single possible lexem. For example, the token for the arithmetic operator sign +, which might contain the name plus_op, has just one single possible lexeme. Consider the following Java statement:
index = 2 * count up + 17;
The lexemes and tokens of the statement are:
Lexemes Tokens
index identifier
= equivalent_sign
2 int_literal
* mult_op
count identifier
+ plus_op
17 int_literal
; semicolon
Figure 1: Lexems and Tokens
Language recognizers
Language recognizer is a machine to accept language of the source code. The source is a string of individuals where in fact the machine filter systems the string whether the language is suitable or not. For example, if the string is acknowledged by terms Java, the words recognizer will accept it and put it in accept state. If not, it will reject the string.
Language Generators
Language generator is a device to create the sentences of an terms. It has a construction description. In the event the language generator enables to create all strings is vocabulary Java (for example), then your compiler will precede another process.
Quiz
What is the different between syntax and semantic?
Answer: Syntax identifies formal rules regulating the structure of valid statements in a vocabulary. Semantics identifies the group of rules which give the meaning of your statement.
Formal Ways of Describing Syntax
Syntax of programming languages uses grammars to describe the formal words mechanisms using BNF of context free sentence structure.
Context-free Grammars
Context free sentence structure originally originates from formal vocabulary theory. In computer research, context free grammar involves with program writing language syntax which is tokens.
Backus-Naur Form (BNF)
In computer research, BNF (Backus Normal Form or Backus-Naur Form) is a notation technique for context-free grammars, often used to describe the syntax of languages used in processing, such as computer programming languages, document platforms, instruction collections and communication protocols (Wikipedia).
Basic of Metalanguage
A metalanguage is a vocabulary used to make clear and sophisticated another terms. In programming dialects, BNF is the metalanguage.
BNF uses abstractions for syntactic set ups. A straightforward Java assignment statement, for example, might be symbolized by the abstraction. The particular explanation of may get by
LHS - is the abstraction being defined
This declaration is a rule
RHS - this is of LHS, contain token, lexem, and reference
(Noted: LHS is remaining hand area, RHS is right hand side)
Figure 2 : metalanguage
This particular guideline specifies that the abstraction is defined as an instance of the abstraction, followed by the lexeme =, followed by an example of the abstraction.
Nonterminal icons are abstraction in BNF description.
Terminal icons are lexemes and tokens of the guidelines in BNF.
Collections of rule are grammar in BNF.
Nonterminal symbols have explanations that represent two or more syntactic forms in the terms. When there is multiple definitions, it can be written in independent by using mark |, as an individual rule. For example, in Java for if-else statement
if else
if else if else
Or, by getting started with them collectively to become
if else
| if else if else
Lists
BNF uses recursion in guideline to stand for lists. The rule is recursive if statement in LHS looks in RHS. For instance:
identifier
| identifier,
This defines as either a single token (identifier) or an identifier accompanied by a comma accompanied by another cases of.
Grammars and derivations
A grammar starts off with a special nonterminal of the grammar which is called the start mark. The next sentence elaborating the beginning sentence is named a derivation. Usually, the start symbol starts with. For instance,
A grammar for a Small Language
get started end
| ;
A | B | C
| -
|
Figure 3 : Small language's grammar
The language starts off with begin and ends with end. The assertions in the terminology are segregated by using semicolon. The words only has one kind of assertion which is project. The assignment is assigning appearance to a changing, var. The expression is either + or -. There are only three parameters in the terminology which are A, B, C.
If we derive the dialect, it may become:
=> begin end
=> begin ; end
=> get started = ; end
=> start A = ; end
=> commence A = + ; end
=> start A = B + ; end
=> get started A = B + C ; end
=> start A = B + C ; end
=> get started A = B + C ; = end
=> get started A = B + C ; B = end
=> start A = B + C ; B = end
=> start A = B + C ; B = C end
This derivation starts with the accompanied by => this means or reads asderives. Each successive string in the series is derived from the previous string by upgrading one of the nonterminals with one of that nonterminal's explanations. Another example,
A Sentence structure for Simple Assignment Statements
A | B | C
| *
| ()
|
Figure 4 : Assignment's grammar
The grammar above only derives assignment statements. It includes id and expr as expression. id is changing A, B, or C. Appearance is merely addition, multiplication and parentheses. For the statement
A = B * ( A + C )
It is produced by the leftmost derivation as:
=> =
=> A =
=> A = *
=> A = B *
=> A = B * ()
=> A = B * ( + )
=> A = B * (A + )
=> A = B * (A + )
=> A = B * (A + C)
Parse Tree
Parse tree is a representation of syntactical grammars in hierarchical structure form. Every inner mode of any parse tree is tagged with a nonterminal symbol; every leaf is tagged with a terminal symbol. Every sub-tree of your parse tree details one instance of any abstraction in the sentence.
If we take above assertion, A = B * ( A + C ), below is its parse tree:
=
A
*
B
(
)
+
A
C
Figure 5 : parse tree
Ambiguity
Ambiguity happens in parse tree if there are several different parse tree. Let say we have below affirmation:
A = B + C * A
and below sentence structure for assignment statement:
A | B | C
| *
| ()
|
Therefore, the sentence structure will produce two specific parse trees which it cannot be determined by compiler.
Figure 6 : Anambiguous parse tree
Operator Precedence
The problem in ambiguous sentence structure can be solved by having independent guideline for addition and multiplication. When they are separated, they are really preserved in higher to lower buying, respectively in the parse tree. Below is the new sentence structure:
A | B | C
|
|
()
|
If we use above grammar for any = B + C * A, we get below derivation which is kept most derivation:
=> =
=> A =
=> A = +
=> A = +
=> A = +
=> A = +
=> A = B +
=> A = B + *
=> A = B + *
=> A = B + *
=> A = B + C *
=> A = B + C *
=> A = B + C * A
The parse tree using unambiguous sentence structure is below:
=
A
+
*
A
B
C
Figure 8 : parse tree of operator precedence
Test
What are categories of program language? State.
Answer: Two. High level and low level language
State the six tools of compilation process.
Answer
Lexical analyzer: to decompose source code into tokens.
Syntax analyzer: to merge the tokens into affirmation, then into program which later verifying the syntax.
Semantic analyzer: to check on this is of the foundation code.
Intermediate code generator: to create an internal form of an application for example; a=b+c (+, b, c, L1) (=, L1, a)
Code optimizer: to optimize object program for the intended purpose of performance and reliability
Code generator: to transformed intermediate representation into aim for language.
Write BNF explanations for a Java school definition header statement.
Answer (vary):
(class description)
(field explanation)
(method meaning)
(body assertion)
(come back type)
Using body 3, show a parse tree and a leftmost derivation for each and every following claims:
A = A * (B+(C*A))
B = C * (A*C+B)