Continuing from our previous example, The linker can now takes over, it searches for the actual object code implementation containing scanf and printf, (along with any other dependencies), and combines it with the original main.o code from the example above.
In combining the object files, the references are resolved so that scanf and printf implementation can be accessed by the main.o machine code in the same way that find_sum can be accessed from the original object code file. The result is a stand alone executable file which is essentially our program.
In summary, Linkers combine the generated object code from the translation process to other dependencies or references needed to execute the program, it converts reference placeholders to actual addresses, thus linking the main program to its dependencies to create fully functioning executable code.
The optimized TAC is now used to generate assembly code.
The assembly code is converted into object code in a process known as assembly.
This object code contains machine instructions corresponding to the source code, but may not be fully functional for execution as yet.
Consider the following example in source code in a file “main.c” :
#include <stdio.h>
//function prototype omitted for simplicity
int find_sum(int num1, int num2) {
return num1 + num2;
}
int main() {
int num1, num2;
printf("Enter the first number: ");
scanf("%d", &num1);
printf("Enter the second number: ");
scanf("%d", &num2);
int result = find_sum(num1, num2);
printf("The sum of %d and %d is %d\n", num1, num2, result);
return 0;
}
The machine code for the main and find_sum function is generated, and stored in the object file, main.o . Since scanf and printf is used but exist in other libraries, the compiler creates code know as references (placeholders) to call scanf and printf, but doesn’t include the implementation of of scanf and printf in in the main.o object file. The creation of main.o was the final step of the code generation.
Now we move on to the last phase to use a linkers combine various bits of object code to create one executable file.
Three address code (TAC) has the advantage of being easily modified to optimize the code. Code optimization is where the newly generated intermediate code is processed to remove redundancy in order to produce efficient TAC that will eventually be used to generate object code.
Our previous example is not sufficient to illustrate code optimization; we will use the following source code:
#include <stdio.h>
int main()
{
int a=1;
int b=2;
int c=3;
a= a+b;
a= a+c; //a is now a+b+c
c= a-b; //in the end,c=a+c
return 0;
}
In TAC, here’s a comparison between the newly generated intermediate code and the optimized code :
Newly generated intermediate code (TAC)
int main() {
int a = 1;
int b = 2;
int c = 3;
t1 = a + b; // t1 = 1 + 2
a = t1; // a = t1 (a is now a+b)
t2 = a + c; // t2 = (a+b) + 3
a = t2; // a = t2 (a is now a+b+c)
t3 = a - b; // t3 = (a+b+c) - 2
c = t3; // c = t3 (c is now a+b+c)
return 0;
}
Optimized (TAC )
int main() {
int a = 6; // Optimized: a = 6 (1 + 2 + 3)
// (Constant Folding)
int c = a; // Optimized: c = a (c is now a+b+c)
// (Algebraic simplification)
return 0;
}
From here, we move onto the final stage, Code Generation, which produces object code from our optimized TAC
After all structures are generated without any errors, a simplified intermediate code is generated. In c compilation, the intermediate code is three address code (TAC).
TAC :
consists of simplified instructions, and operations only take 3 operands , a destination , 2 source operands and an operation.
TAC uses multiple intermediate variables to arrive at the value for complex expressions.
For example, the following operation,
D
=
b
*
b
–
4
*
a
*
c
;
Could be represented as:
t1 = b * b // Compute the square of b and store
//it in temporary variable t1
t2 = 4 * a // Compute 4 times the value of a and
//store it in temporary variable t2
t3 = t2 * c // Multiply the result of t2 by c
//and store it in temporary variable t3
t4 = t1 - t3 // Subtract the value of t3 from t1
//and store it in temporary variable t4
D = t4 // Assign the value in t4 to variable D
Note that the above TAC code is not exact, it has been simplified for the purposes of demonstration.
The actual source code and corresponding TAC code is shown here:
int radius = 3;
float Pi = 3.14;
float diameter = 2*radius*Pi;
/*radius will be typecast to float
during the calculation, i.e 3.00
(This result is reflected in the next step,
intermediate code generation
as TAC)
*/
the TAC code generated from the above example:
t1 = 2.0 // Represents the float constant 2.0
t2 = (float)radius // Type casting 'radius' to float
t3 = t1 * t2 // Multiplying 2.0 by the type-casted 'radius'
t4 = t3 * Pi // Multiplying the result by 'Pi'
Variable Declaration checks :
Ensuring all variables are declared. Eg,
int x = 10;
int y = 10;
sum = x+y; //sum was not declared
Reserved identifier / keyword use. E.g,
int char = '1';//char is a reserved identifier
int char_2;
int if; //if is a reserved keyword
int if_2;
Variable scope check
A unique variable declared within a function cannot be used by any other structures or functions outside of the original function block. Eg,
void demoFunction()
{
int uniqueX = 10;
}
int main()
{
printf("%d\n", uniqueX); // Error: 'uniqueX' is out of scope
return 0;
}
Function Declaration and Invocation checks:
Function declaration check – missed function declaration. e.g
int main() {
// Calling a function before it is declared or defined
hello(); // Error: 'hello' is called before declaration/definition
return 0;
}
/*the declaration was omitted,
this function declaration
is required:
void hello();
*/
void hello() {
// Function definition for 'hello'
printf("Hello, world!\n");
}
Function declaration check – missing function implementation when a function prototype (function declaration) was declared. E.g,
// Function prototype (declaration)
int multiply(int a, int b);
int main() {
int result = multiply(3, 7); /* Error: Function
'multiply'
is declared but not
defined/implemented*/
return 0;
}
Function invocation checks.
(NB, function invocation simply means to call a function correctly)
int add(int a, int b);
int add(int a, int b) {
return a + b;
}
int main() {
int sum1 = add(5, 10, 15); /* Error: Too many arguments
in the function call*/
int sum2 = add(5);/*Error: Missing arguments
in the function call*/
int sum3 = add(1,2) /*proper invocation of add,
no error here*/
return 0;
}
Control Flow Checks -Verify the correct structure and use of control structures: if, while, for etc
Other checks exist , we have only examined some of the main semantic checks.
Recall that once all semantic checks are passed, Intermediate code generation occurs next.
Examples of Semantic errors in C
Example
Description/Explanation
int a = “hello”;
Type mismatch: assigning a string to an integer variable.
int a; return a;
Using uninitialized variable a.
int a = 5 / 0;
Division by zero.
char *str = malloc(10); free(str); str[0] = ‘A’;
Using a pointer str after it has been freed.
int* ptr; *ptr = 10;
Dereferencing an uninitialized pointer ptr.
int arr[5]; arr[10] = 50;
Array index out of bounds.
double result = sqrt(-1);
Passing invalid argument to a function (e.g., square root of a negative number).
Semantic analysis checks that the code makes programmatic sense by making sure that variables , constructs, and expression are used in a correct way according with the programming language’s rules.
This is accomplished by using the parse tree ( generated from syntax analysis) and a symbol table to perform semantic checks.
Symbol Tables
A symbol (identifier) table is generated during semantic analysis which stores the attributes of each identifier including:
name,
datatype ,
Scope: where in the source code it was declared (Global or within functions)
For example, for the line of code below,
x = 150;
the following table entry may be generated:
Identifier
Type
Value
Memory Location
Scope
x
int
150
0x1001
Global
Note that in practice, symbol tables may store more metadata – we have simplified the concept for easy understanding.
Symbol Table Example With A Full Program
Code
#include <stdio.h>
float PI = 3.14159265359; // Define PI as a variable
//function declaration was omitted for simplicity.
float calculateArea(float radius) {
return PI * radius * radius;
}
int main() {
float radius;
printf("Enter the radius of the circle: ");
scanf("%f", &radius);
float area = calculateArea(radius);
printf("The area of the circle with radius %.2f is %.2f\n", radius, area);
return 0;
}
Symbol Table
The following simplified symbol table is generated:
Identifier
Type
Value
Memory Location
Scope
PI
float
3.14159265359
0x1001
Global
calculateArea
float()
Global
main
int()
Global
radius
float
0x1002
main
area
float
0x1003
main
Taking an overview in order to perform these checks, we observer some facts of compiler design:
All parse trees and symbol tables are checked to see if they obey the rules of the language.
There are rules for the structure of the parse tree;
There are rules for the symbol table.
Together, these rules enforce the semantic rules of the language.
In this set of examples , we consider a hypothetical compiler of our own design.
Recall the parse tree generated from example 1:
Line 1: Real D,b,a,c;
Line 2: D= b*b – 4*a*c;
The tokens making up the line 2 (shown below)….
D
=
b
*
b
–
4
*
a
*
c
;
…will generate the following tree:
Example 2 – Check Symbol Tables for consistency
It is worth noting that another possible semantic check would be to see if all token identifiers have been declared. In our case for example 1:
Line 1: Real D,b,a,c;
Line 2: D= bb – 4*a*c;
we see that our typo could have seen our tokens read as
D
=
bb
–
4
*
a
*
c
;
At this stage, we should be able to check that each token was declared. In this case , when we check our symbol table and see that it was not declared. The compiler can now record an error for this line stating that bb was not declared.
Example 3 – Check Parse tree for invalid operands (Type Mismatch)
Consider the following code:
Line 1: Real D,b,a,c
Line 2: D= “Fred”
Line 2 produces tokens:
D
=
“Fred”
Which produces the tree :
Note that D is of type real. We could implement a new rule for our compiler:
All child nodes on the same level must have the same data type found in the symbol table.
Thus after checking our tree , we can record a type mismatch error for our current line.
Symbol Table and Parse Tree Analysis Summary
All symbol tables and trees are checked for rules which identify violation of rules in our source language.
Every time an error is found, its line location is recorded with an appropriate message.
After all symbols and trees are checked, generate a list of errors and stop compilation.
If no errors were recorded, move on to the next step: Intermediate code generation