Code Optimization (Intermediate)

Three address code (TAC) has the advantage of being easily modified  to optimize the code. Code optimization is where the newly generated intermediate code is processed to remove redundancy in order to produce efficient TAC that will eventually be used to generate object code.

Our previous example is not sufficient to illustrate code optimization; we will use the following source code:

#include <stdio.h>
int main()
{
    int a=1;
    int b=2;
    int c=3;
    
    a= a+b;
    a= a+c; //a is now a+b+c
    c= a-b; //in the end,c=a+c
    return 0;
} 

In TAC, here’s a comparison between the newly generated intermediate code and the optimized  code :

Newly generated intermediate code (TAC)

int main() {
    int a = 1;
    int b = 2;
    int c = 3;

    t1 = a + b; // t1 = 1 + 2
    a = t1;     // a = t1 (a is now a+b)

    t2 = a + c; // t2 = (a+b) + 3
    a = t2;     // a = t2 (a is now a+b+c)

    t3 = a - b; // t3 = (a+b+c) - 2
    c = t3;     // c = t3 (c is now a+b+c)

    return 0;
}

Optimized (TAC )

int main() {
    int a = 6; // Optimized: a = 6 (1 + 2 + 3)
               // (Constant Folding)
    int c = a; // Optimized: c = a (c is now a+b+c)
               // (Algebraic simplification)
    return 0;
}


From here, we move onto the final stage, Code Generation, which produces object code from our optimized TAC

© 2024  Vedesh Kungebeharry. All rights reserved. 

Intermediate Code Generation

After all structures are generated without any errors, a simplified intermediate code is generated. In c compilation, the intermediate code is three address code (TAC).

TAC :

  • consists of simplified instructions, and operations only take 3 operands , a destination , 2 source operands and an operation.  

  • TAC uses multiple intermediate variables to arrive at the value for complex expressions.

For example, the following operation,

D=b*b4*a*c;

Could be represented as:

t1 = b * b      // Compute the square of b and store 
                //it in temporary variable t1
t2 = 4 * a      // Compute 4 times the value of a and 
                //store it in temporary variable t2
t3 = t2 * c     // Multiply the result of t2 by c 
                //and store it in temporary variable t3
t4 = t1 - t3    // Subtract the value of t3 from t1 
                //and store it in temporary variable t4
D = t4          // Assign the value in t4 to variable D

Note that the above TAC code is not exact, it has been simplified for the purposes of demonstration.

The actual source code and corresponding TAC code is shown here:

C Source (source.c)

#include <stdio.h>
int main()
{   
    float D,b,a,c;
    D= b*b - 4*a*c; 
    return 0;
}

Intermediate TAC (source.c.005t.gimple)

main ()
{
  int D.3251;

  {
    float D;
    float b;
    float a;
    float c;

    _1 = b * b;
    _2 = a * 4.0e+0;
    _3 = c * _2;
    D = _1 - _3;
    D.3251 = 0;
    return D.3251;
  }
  D.3251 = 0;
  return D.3251;
}


The TAC was obtained by running the command “gcc -fdump-tree-all -o program.exe source.c”

See all files here: https://drive.google.com/drive/folders/1dDYCaNddQFpAF2oDknWrEFkPpj692xQ7?usp=sharing

© 2024  Vedesh Kungebeharry. All rights reserved. 

Examples of Errors During Compilation Stages (Lexical, Syntax, Semantic)

Lexical Errors

ExampleDescription/Explanation
int n%mber = 5;Illegal character % in identifier.
floar x = 5.0;Misspelling of keyword float as floar.
int 1stNumber = 1;Identifiers cannot start with a digit.
#incldue <stdio.h>Misspelling of directive #include.
double price = 45.6L;Incorrect use of float literal suffix (L is for long int).
char* str = 'Hello';Single quotes used instead of double quotes for string literal.

Syntax Errors

ExampleDescription/Explanation
int a = 5Missing semicolon at the end of the statement.
if (a < 5) { int b = 0;Missing closing brace } for the if statement block.
for (int i = 0 i < 10; i++)Missing semicolon in the for loop declaration.
printf("%d", a, b);Mismatch in the number of arguments in printf.
int array[5 = {1, 2, 3, 4, 5};Syntax error in array initialization (should be array[5] = {1, 2, 3, 4, 5};).
while(a < 5)Missing opening or closing braces for the while loop body.
int x = (2, 3);Incorrect use of the comma operator in assignment.

Semantic Errors

ExampleDescription/Explanation
int a = "hello";Type mismatch: assigning a string to an integer variable.
int a; return a;Using uninitialized variable a.
int a = 5 / 0;Division by zero.
char *str = malloc(10); free(str); str[0] = 'A';Using a pointer str after it has been freed.
int* ptr; *ptr = 10;Dereferencing an uninitialized pointer ptr.
int arr[5]; arr[10] = 50;Array index out of bounds.
double result = sqrt(-1);Passing invalid argument to a function (e.g., square root of a negative number).
FILE *fp = fopen("nonexistent.txt", "r"); if(fp) { /* ... */ }Incorrect error checking on fopen. Should be if(!fp) for error checking.

© 2024  Vedesh Kungebeharry. All rights reserved. 

Semantic Analysis Checks using Coded Examples

Checks in semantic analysis include:

  • Type checking :
    • Type mismatch checking. E.g,
  int x = “Hello”;  
  • Type cast checking. E.g,
int radius = 3;
float Pi = 3.14;
float diameter = 2*radius*Pi; 
/*radius will be typecast to float
during the calculation, i.e 3.00 
(This result is reflected in the next step, 
intermediate code generation
 as  TAC)
*/


the TAC code generated from the above example:

t1 = 2.0  // Represents the float constant 2.0
t2 = (float)radius  // Type casting 'radius' to float
t3 = t1 * t2  // Multiplying 2.0 by the type-casted 'radius'
t4 = t3 * Pi  // Multiplying the result by 'Pi'

  • Variable Declaration checks :

    • Ensuring all variables are declared. Eg,
int x = 10;
int y = 10;
sum = x+y; //sum was not declared

  • Reserved identifier / keyword use. E.g,
int char = '1';//char is a reserved identifier
int char_2;

int if; //if is a reserved keyword
int if_2; 

  • Variable scope check

    A unique variable declared within a function cannot be used by any other structures or functions outside of the original function block. Eg,
void demoFunction() 
{
    int uniqueX = 10;
}

int main() 
{
    printf("%d\n", uniqueX); // Error: 'uniqueX' is out of scope
    return 0;
}

  • Function Declaration and Invocation checks:
    • Function declaration check – missed function declaration. e.g
int main() {
    // Calling a function before it is declared or defined
    hello(); // Error: 'hello' is called before declaration/definition
    
    return 0;
}
/*the declaration was omitted, 
this function declaration
is required: 

void hello();

*/
void hello() {
    // Function definition for 'hello'
    printf("Hello, world!\n");
}

  •  Function declaration check – missing function implementation when a function prototype (function declaration) was declared. E.g,
// Function prototype (declaration)
int multiply(int a, int b);

int main() {
    int result = multiply(3, 7); /* Error: Function 
                                'multiply' 
                                is declared but not 
                                defined/implemented*/    
    return 0;
}

  • Function invocation checks.

(NB, function invocation simply means to call a function correctly)

int add(int a, int b);

int add(int a, int b) {
    return a + b;
}

int main() {
    int sum1 = add(5, 10, 15); /* Error: Too many arguments
                                in the function call*/
                                
    int sum2 = add(5);/*Error: Missing arguments 
                        in the function call*/
                        
    int sum3 = add(1,2) /*proper invocation of add,
                        no error here*/
    return 0;
}

  • Control Flow Checks -Verify the correct structure and use of control structures: if, while, for etc

Other checks exist , we have only examined some of the main semantic checks.

Recall that once all semantic checks are passed, Intermediate code generation occurs next.

Examples of Semantic errors in C

ExampleDescription/Explanation
int a = “hello”;Type mismatch: assigning a string to an integer variable.
int a; return a;Using uninitialized variable a.
int a = 5 / 0;Division by zero.
char *str = malloc(10); free(str); str[0] = ‘A’;Using a pointer str after it has been freed.
int* ptr; *ptr = 10;Dereferencing an uninitialized pointer ptr.
int arr[5]; arr[10] = 50;Array index out of bounds.
double result = sqrt(-1);Passing invalid argument to a function (e.g., square root of a negative number).
FILE *fp = fopen(“nonexistent.txt”, “r”); if(fp) { /* … */ }Incorrect error checking on fopen. Should be if(!fp) for error checking.

© 2024  Vedesh Kungebeharry. All rights reserved. 

Semantic analysis

Semantic analysis checks that the code makes programmatic sense by making sure that variables , constructs, and expression are used in a correct way according with the programming language’s rules.

This is accomplished by using  the parse tree ( generated from syntax analysis)  and a  symbol  table  to perform semantic checks.

Symbol Tables

A symbol (identifier) table is generated during semantic analysis which stores the attributes of each identifier including:

  • name,
    • datatype ,
    • Scope: where in the source code it was declared (Global or within functions)

For example, for the line of code below,

            x = 150;

the following table entry may be generated:

IdentifierTypeValueMemory LocationScope
xint1500x1001Global

Note that  in practice, symbol tables may store more metadata – we have simplified the concept for easy understanding.

Symbol Table Example With A Full Program

Code
#include <stdio.h>

float PI = 3.14159265359;  // Define PI as a variable
//function declaration was omitted for simplicity.
float calculateArea(float radius) {
    return PI * radius * radius;
}

int main() {
    float radius;

    printf("Enter the radius of the circle: ");
    scanf("%f", &radius);

    float area = calculateArea(radius);
    
    printf("The area of the circle with radius %.2f is %.2f\n", radius, area);

    return 0;
}

Symbol Table

The following simplified symbol table is generated:

IdentifierTypeValueMemory LocationScope
PIfloat3.141592653590x1001Global
calculateAreafloat()Global
mainint()Global
radiusfloat0x1002main
areafloat0x1003main

Taking an overview in order to perform these checks, we observer some facts of compiler design:

  • All parse trees and symbol tables are checked to see if they obey the rules of the language.
  • There are rules for the structure of the parse tree;
  • There are rules for the symbol table.
  • Together, these rules enforce the semantic rules of the language.

In this set of examples , we consider a hypothetical compiler of our own design.

Recall the parse tree generated from example 1:

Line 1: Real D,b,a,c;

Line 2: D= b*b – 4*a*c;  

The tokens making up the line 2 (shown below)….

D=b*b4*a*c;



…will generate the following tree:

Example 2 – Check Symbol Tables for consistency

It is worth noting that another possible semantic check would be to see if all token identifiers have been declared. In our case for example 1:

Line 1: Real D,b,a,c;

Line 2: D= bb – 4*a*c;  

we see that our typo could have seen our tokens read as 

D=bb4*a*c;

At this stage, we should be able to check that each token was declared.  In this case , when we check our symbol table and see that it was not declared.  The compiler can now record an error for this line stating that bb was not declared.
 

Example 3 – Check Parse tree for invalid operands (Type Mismatch)

Consider the following code:

Line 1: Real D,b,a,c

Line 2: D= “Fred”

 Line 2 produces tokens:

D=“Fred”

Which produces the tree :

Note that D is of type real.  We could implement a new rule for our compiler:

All child nodes on the same level must have the same data type found in the symbol table.

Thus after checking our tree , we can record a type mismatch error for our current line.

Symbol Table and Parse Tree Analysis Summary

  1. All symbol tables and trees are checked for rules which identify violation of rules in our source language.
  2. Every time an error is found, its line location is recorded with an appropriate message.
  3. After all symbols and trees are checked, generate a list of errors and stop compilation.
  4. If no errors were recorded, move on to the next step: Intermediate code generation

© 2024  Vedesh Kungebeharry. All rights reserved. 

Syntax analysis (parsing)

Parse trees are generated for each expression.

e.g  for the  code

Line 1: Real D,b,a,c;

Line 2: D= b*b – 4*a*c;  

The tokens making up the line 2 (shown below)….

D=b*b4*a*c;



…will generate the following tree:

(Note that “” is a null left child of C)

This tree can now be checked for structural errors related to the grammatical rules of the language.

This tree is stored for use in generating the intermediate code and for error checking in semantic parsing.

Demonstration: Teacher uses an inorder traversal to regenerate the original code statement from the tree.

Examples of Syntax errors in C

ExampleDescription/Explanation
int a = 5Missing semicolon at the end of the statement.
if (a < 5) { int b = 0;Missing closing brace } for the if statement block.
for (int i = 0 i < 10; i++)Missing semicolon in the for loop declaration.
printf("%d", a, b);Mismatch in the number of arguments in printf.
int array[5 = {1, 2, 3, 4, 5};Syntax error in array initialization (should be array[5] = {1, 2, 3, 4, 5};).
while(a < 5)Missing opening or closing braces for the while loop body.
int x = (2, 3);Incorrect use of the comma operator in assignment.

© 2024  Vedesh Kungebeharry. All rights reserved. 

Lexical Analysis

  • This converts the source code into tokens. The individual characters of the source file must be grouped into tokens.

    e.g. x = 150;  will be broken into  4 tokens:
x=150;

Note that tokens are not just the characters that make up the line, but the parts that make up the entire line.  In the above example , 150 is recognized as a single token and not 3 tokens of 1, 5 and 0.

  • Next, a symbol (identifier) table is generated which stores the attributes of each identifier including:
    • name,
    • datatype ,
    • Scope: where in the source code it was declared (Globa , within functions)

For our line of code above, the following table entry may be generated:

IdentifierTypeValueMemory LocationScope
xint1500x1001Global

Note that  in practice, symbol tables may store more metadata – we have simplified the concept for easy understanding.

Symbol Table Example With A Full Program

Code

#include <stdio.h>   float PI = 3.14159265359;  // Define PI as a variable   float calculateArea(float radius) {     return PI * radius * radius; }   int main() {     float radius;       printf(“Enter the radius of the circle: “);     scanf(“%f”, &radius);       float area = calculateArea(radius);         printf(“The area of the circle with radius %.2f is %.2f\n”, radius, area);       return 0; }  

Symbol Table

The following simplified symbol table is generated:

IdentifierTypeValueMemory LocationScope
PIfloat3.141592653590x1001Global
calculateAreafloat()Global
mainint()Global
radiusfloat0x1002main
areafloat0x1003main

Error Detection (Lexical Error Handling)

Errors in the tokens are detected.

 Common errors include detecting :

  • invalid characters  not specified in the language , e.g

    int café = 42; // The ‘é’ character is not valid in C.


  • invalid identifier declaration, e.g,
    int $var = 10;  // Invalid character ‘$’

  • The misspelling of keywords eg
    innt x; //should be int:

Examples of lexical Errors In C

ExampleDescription/Explanation
int n%mber = 5;Illegal character % in identifier.
floar x = 5.0;Misspelling of keyword float as floar.
int 1stNumber = 1;Identifiers cannot start with a digit.
#incldue <stdio.h>Misspelling of directive #include.
double price = 45.6L;Incorrect use of float literal suffix (L is for long int).
char* str = ‘Hello’;Single quotes used instead of double quotes for string literal.

© 2024  Vedesh Kungebeharry. All rights reserved. 

Translation (Program Translation During the Compilation process)

This note continues from: Compilers

After the preprocessing produces a single source file in high level language , Translation of this source code can now occur.

Translation is the process of converting the High Level Language’s source code to object code.

It involves:

Analysis stageDescription
Lexical analysisBreaking the source code correctly into tokens
Syntax AnalysisEnsuring that the grammatical rules of the language are obeyed
Semantic analysisEnsuring that the program statements make logical sense by adhering to the rules of the language.  

After the steps above are completed, we are left with some output:  data structures and used in analysing the source code for errors, and if no errors were found, intermediate object code.

OutputSub Stage
A symbol tableLexical analysis
A parse treeSyntax Analysis
An annotated abstract treeSemantic analysis
Unoptimized Object CodeIntermediate code generation
Optimized Object codeCode Optimization

© 2024  Vedesh Kungebeharry. All rights reserved. 

Compilers

Compilers are tools used to produce executable code from source code.

Compilers process code through many stages, but it can be thought of as 3 main steps:

StepDescription
PreprocessingCombining all the source code from multiple files into 1 source code file
TranslationChecking the source code (produced from the preprocessing stage) for errors. If errors are found, the entire compilation process is halted and error messages generated, otherwise the source code is translated into object code.
LinkingCombining the generated object file to other dependencies to form a stand alone executable.    (e.g for a “hello world” c program at the console, the main object code of a would be combined with object code for output to the console to produce a single executable file.)

© 2024  Vedesh Kungebeharry. All rights reserved.