Three address code (TAC) has the advantage of being easily modified to optimize the code. Code optimization is where the newly generated intermediate code is processed to remove redundancy in order to produce efficient TAC that will eventually be used to generate object code.
Our previous example is not sufficient to illustrate code optimization; we will use the following source code:
#include <stdio.h>
int main()
{
int a=1;
int b=2;
int c=3;
a= a+b;
a= a+c; //a is now a+b+c
c= a-b; //in the end,c=a+c
return 0;
}
In TAC, here’s a comparison between the newly generated intermediate code and the optimized code :
Newly generated intermediate code (TAC)
int main() {
int a = 1;
int b = 2;
int c = 3;
t1 = a + b; // t1 = 1 + 2
a = t1; // a = t1 (a is now a+b)
t2 = a + c; // t2 = (a+b) + 3
a = t2; // a = t2 (a is now a+b+c)
t3 = a - b; // t3 = (a+b+c) - 2
c = t3; // c = t3 (c is now a+b+c)
return 0;
}
Optimized (TAC )
int main() {
int a = 6; // Optimized: a = 6 (1 + 2 + 3)
// (Constant Folding)
int c = a; // Optimized: c = a (c is now a+b+c)
// (Algebraic simplification)
return 0;
}
From here, we move onto the final stage, Code Generation, which produces object code from our optimized TAC
After all structures are generated without any errors, a simplified intermediate code is generated. In c compilation, the intermediate code is three address code (TAC).
TAC :
consists of simplified instructions, and operations only take 3 operands , a destination , 2 source operands and an operation.
TAC uses multiple intermediate variables to arrive at the value for complex expressions.
For example, the following operation,
D
=
b
*
b
–
4
*
a
*
c
;
Could be represented as:
t1 = b * b // Compute the square of b and store
//it in temporary variable t1
t2 = 4 * a // Compute 4 times the value of a and
//store it in temporary variable t2
t3 = t2 * c // Multiply the result of t2 by c
//and store it in temporary variable t3
t4 = t1 - t3 // Subtract the value of t3 from t1
//and store it in temporary variable t4
D = t4 // Assign the value in t4 to variable D
Note that the above TAC code is not exact, it has been simplified for the purposes of demonstration.
The actual source code and corresponding TAC code is shown here:
int radius = 3;
float Pi = 3.14;
float diameter = 2*radius*Pi;
/*radius will be typecast to float
during the calculation, i.e 3.00
(This result is reflected in the next step,
intermediate code generation
as TAC)
*/
the TAC code generated from the above example:
t1 = 2.0 // Represents the float constant 2.0
t2 = (float)radius // Type casting 'radius' to float
t3 = t1 * t2 // Multiplying 2.0 by the type-casted 'radius'
t4 = t3 * Pi // Multiplying the result by 'Pi'
Variable Declaration checks :
Ensuring all variables are declared. Eg,
int x = 10;
int y = 10;
sum = x+y; //sum was not declared
Reserved identifier / keyword use. E.g,
int char = '1';//char is a reserved identifier
int char_2;
int if; //if is a reserved keyword
int if_2;
Variable scope check
A unique variable declared within a function cannot be used by any other structures or functions outside of the original function block. Eg,
void demoFunction()
{
int uniqueX = 10;
}
int main()
{
printf("%d\n", uniqueX); // Error: 'uniqueX' is out of scope
return 0;
}
Function Declaration and Invocation checks:
Function declaration check – missed function declaration. e.g
int main() {
// Calling a function before it is declared or defined
hello(); // Error: 'hello' is called before declaration/definition
return 0;
}
/*the declaration was omitted,
this function declaration
is required:
void hello();
*/
void hello() {
// Function definition for 'hello'
printf("Hello, world!\n");
}
Function declaration check – missing function implementation when a function prototype (function declaration) was declared. E.g,
// Function prototype (declaration)
int multiply(int a, int b);
int main() {
int result = multiply(3, 7); /* Error: Function
'multiply'
is declared but not
defined/implemented*/
return 0;
}
Function invocation checks.
(NB, function invocation simply means to call a function correctly)
int add(int a, int b);
int add(int a, int b) {
return a + b;
}
int main() {
int sum1 = add(5, 10, 15); /* Error: Too many arguments
in the function call*/
int sum2 = add(5);/*Error: Missing arguments
in the function call*/
int sum3 = add(1,2) /*proper invocation of add,
no error here*/
return 0;
}
Control Flow Checks -Verify the correct structure and use of control structures: if, while, for etc
Other checks exist , we have only examined some of the main semantic checks.
Recall that once all semantic checks are passed, Intermediate code generation occurs next.
Examples of Semantic errors in C
Example
Description/Explanation
int a = “hello”;
Type mismatch: assigning a string to an integer variable.
int a; return a;
Using uninitialized variable a.
int a = 5 / 0;
Division by zero.
char *str = malloc(10); free(str); str[0] = ‘A’;
Using a pointer str after it has been freed.
int* ptr; *ptr = 10;
Dereferencing an uninitialized pointer ptr.
int arr[5]; arr[10] = 50;
Array index out of bounds.
double result = sqrt(-1);
Passing invalid argument to a function (e.g., square root of a negative number).
Semantic analysis checks that the code makes programmatic sense by making sure that variables , constructs, and expression are used in a correct way according with the programming language’s rules.
This is accomplished by using the parse tree ( generated from syntax analysis) and a symbol table to perform semantic checks.
Symbol Tables
A symbol (identifier) table is generated during semantic analysis which stores the attributes of each identifier including:
name,
datatype ,
Scope: where in the source code it was declared (Global or within functions)
For example, for the line of code below,
x = 150;
the following table entry may be generated:
Identifier
Type
Value
Memory Location
Scope
x
int
150
0x1001
Global
Note that in practice, symbol tables may store more metadata – we have simplified the concept for easy understanding.
Symbol Table Example With A Full Program
Code
#include <stdio.h>
float PI = 3.14159265359; // Define PI as a variable
//function declaration was omitted for simplicity.
float calculateArea(float radius) {
return PI * radius * radius;
}
int main() {
float radius;
printf("Enter the radius of the circle: ");
scanf("%f", &radius);
float area = calculateArea(radius);
printf("The area of the circle with radius %.2f is %.2f\n", radius, area);
return 0;
}
Symbol Table
The following simplified symbol table is generated:
Identifier
Type
Value
Memory Location
Scope
PI
float
3.14159265359
0x1001
Global
calculateArea
float()
Global
main
int()
Global
radius
float
0x1002
main
area
float
0x1003
main
Taking an overview in order to perform these checks, we observer some facts of compiler design:
All parse trees and symbol tables are checked to see if they obey the rules of the language.
There are rules for the structure of the parse tree;
There are rules for the symbol table.
Together, these rules enforce the semantic rules of the language.
In this set of examples , we consider a hypothetical compiler of our own design.
Recall the parse tree generated from example 1:
Line 1: Real D,b,a,c;
Line 2: D= b*b – 4*a*c;
The tokens making up the line 2 (shown below)….
D
=
b
*
b
–
4
*
a
*
c
;
…will generate the following tree:
Example 2 – Check Symbol Tables for consistency
It is worth noting that another possible semantic check would be to see if all token identifiers have been declared. In our case for example 1:
Line 1: Real D,b,a,c;
Line 2: D= bb – 4*a*c;
we see that our typo could have seen our tokens read as
D
=
bb
–
4
*
a
*
c
;
At this stage, we should be able to check that each token was declared. In this case , when we check our symbol table and see that it was not declared. The compiler can now record an error for this line stating that bb was not declared.
Example 3 – Check Parse tree for invalid operands (Type Mismatch)
Consider the following code:
Line 1: Real D,b,a,c
Line 2: D= “Fred”
Line 2 produces tokens:
D
=
“Fred”
Which produces the tree :
Note that D is of type real. We could implement a new rule for our compiler:
All child nodes on the same level must have the same data type found in the symbol table.
Thus after checking our tree , we can record a type mismatch error for our current line.
Symbol Table and Parse Tree Analysis Summary
All symbol tables and trees are checked for rules which identify violation of rules in our source language.
Every time an error is found, its line location is recorded with an appropriate message.
After all symbols and trees are checked, generate a list of errors and stop compilation.
If no errors were recorded, move on to the next step: Intermediate code generation
This converts the source code into tokens. The individual characters of the source file must be grouped into tokens.
e.g. x = 150; will be broken into 4 tokens:
x
=
150
;
Note that tokens are not just the characters that make up the line, but the parts that make up the entire line. In the above example , 150 is recognized as a single token and not 3 tokens of 1, 5 and 0.
Next, a symbol (identifier) table is generated which stores the attributes of each identifier including:
name,
datatype ,
Scope: where in the source code it was declared (Globa , within functions)
For our line of code above, the following table entry may be generated:
Identifier
Type
Value
Memory Location
Scope
x
int
150
0x1001
Global
Note that in practice, symbol tables may store more metadata – we have simplified the concept for easy understanding.
Symbol Table Example With A Full Program
Code
#include <stdio.h> float PI = 3.14159265359; // Define PI as a variable float calculateArea(float radius){return PI * radius * radius;} int main(){ float radius; printf(“Enter the radius of the circle: “); scanf(“%f”,&radius); float area = calculateArea(radius); printf(“The area of the circle with radius %.2f is %.2f\n”, radius, area);return 0;}
Symbol Table
The following simplified symbol table is generated:
Identifier
Type
Value
Memory Location
Scope
PI
float
3.14159265359
0x1001
Global
calculateArea
float()
Global
main
int()
Global
radius
float
0x1002
main
area
float
0x1003
main
Error Detection (Lexical Error Handling)
Errors in the tokens are detected.
Common errors include detecting :
invalid characters not specified in the language , e.g
int café = 42; // The ‘é’ character is not valid in C.
invalid identifier declaration, e.g, int $var = 10; // Invalid character ‘$’
The misspelling of keywords eg innt x; //should be int:
Examples of lexical Errors In C
Example
Description/Explanation
int n%mber = 5;
Illegal character % in identifier.
floar x = 5.0;
Misspelling of keyword float as floar.
int 1stNumber = 1;
Identifiers cannot start with a digit.
#incldue <stdio.h>
Misspelling of directive #include.
double price = 45.6L;
Incorrect use of float literal suffix (L is for long int).
char* str = ‘Hello’;
Single quotes used instead of double quotes for string literal.
After the preprocessing produces a single source file in high level language , Translation of this source code can now occur.
Translation is the process of converting the High Level Language’s source code to object code.
It involves:
Analysis stage
Description
Lexical analysis
Breaking the source code correctly into tokens
Syntax Analysis
Ensuring that the grammatical rules of the language are obeyed
Semantic analysis
Ensuring that the program statements make logical sense by adhering to the rules of the language.
After the steps above are completed, we are left with some output: data structures and used in analysing the source code for errors, and if no errors were found, intermediate object code.
Compilers are tools used to produce executable code from source code.
Compilers process code through many stages, but it can be thought of as 3 main steps:
Step
Description
Preprocessing
Combining all the source code from multiple files into 1 source code file
Translation
Checking the source code (produced from the preprocessing stage) for errors. If errors are found, the entire compilation process is halted and error messages generated, otherwise the source code is translated into object code.
Linking
Combining the generated object file to other dependencies to form a stand alone executable. (e.g for a “hello world” c program at the console, the main object code of a would be combined with object code for output to the console to produce a single executable file.)