Whenever we write code to program an application, we’re simply giving instructions to the computer to perform in a specified order, but to execute these instructions it's necessary for the computer to first understand what the instruction is. Which can’t be done using high-level languages that we use.
Hence, we use the compiler, it transforms our source code from high-level language to low-level machine language, without changing the meaning of it, and also makes the end code efficient which is optimized for execution time & memory space.
Types of Compilers
Some different types of compilers are :
- Cross-compiler: this compiler can generate executable code, for a platform other than which it is running on. Eg: we can program an application that runs on android using a compiler on windows/Linux. An example of a cross-compiler is GCC.
- Bootstrap compiler: this compiler is written in the same language, which it intends to compile. Meaning writing a C compiler in C, now this sounds weird. How can one compile a C compiler without having a C compiler? This is also referred to as the Chicken-and-egg problem, like which came first, chicken or egg.
This issue was solved by writing a compiler in C and assembly language, as it got better, we shifted more & more of the compiler to C. until we get an entire compiler written in C.
- Decompiler: as its name suggests, this compiler transforms low-level language into high-level language. E.g. Byte code viewer.
- Source-to-source compiler: this compiler transforms among high-level language. Meaning, it transforms a program written in one high-level language into another high-level language. E.g. Dart, Typescript, etc.
Phases of a Compilers
Vaguely, we can divide it into 2 phases based on the compilation.
- Analysis Phase: here the compiler reads the source code, then separates it to check lexical, grammar & syntax errors. This generates an intermediate representation of the source code & symbol table which it feeds to the next phase, i.e., synthesis phase.
- Synthesis Phase: after taking the input from the analysis phase, the synthesis phase then generates the target code.
Let us now look at these following phases in detail :
- Lexical Analysis: scans the source code like a text scanner, by scanning the string of characters & converts them into meaningful lexemes, represented as
<token-name, attribute-value>
- Syntax Analysis: it takes the token as an input and generates a parse tree. Here the tokens are matched with the source code grammar, it checks whether the expression formed by tokens is syntactically correct or not.
- Semantic Analysis: it checks whether the parse tree has followed the rules of the programming language.
- Intermediate code generation: here, a code is generated which is somewhere in between high-level language and low-level language, it is generated in such a way that it makes it easier to generate the target/machine code.
- Code optimization: here, the code gets cleaned, by removing unnecessary lines, arranging the sequence for better performance.
- Code generation: now our code is optimized, all it needs to do is, map the optimized code into the target machine code. Hence, we get a sequence of relocatable machine codes.
- The symbol table is a data structure, all the identifier’s names along with their types are stored here. It makes it easier for the compiler to search & retrieve the identifier record.
- Error handler: this handles the errors, which occurred in each phase.
Architectures of Compilers
- Single-pass – here the source code directly gets transformed into low-level language. E.g. Turbo Pascal.
- Two-pass / Multi-pass – here the conversion happen in more than one step,
- Pass-1: known as front-end, analytical part, platform-independent.
- Pass-2: known as a back-end, synthesis part, platform dependent
|
Single Pass
|
2-pass/multi-pass
|
speed
|
fast
|
slow
|
memory
|
more
|
less
|
Execution time
|
less
|
more
|
portability
|
no
|
yes
|
Advantage
|
Disadvantage
|
Execution time is less
|
To change the program, you have to go back to the source code & recompile it again
|
Example of compiled languages :
C, C++, C# SCALA, JAVA
Interpreter
Unlike the compiler, the interpreter reads each line of code & then converts it into machine code, and then executes it. It doesn’t mean that interpreters are better or worse than compilers in any way. Most of the code editors or IDE (integrated development environments) employ both compilation & translation to execute high-level languages.
Example of interpreters :
Python: PyPI, CPython, Iron Python
Advantages
|
Disadvantages
|
Easy to use, you can run & check each line of code
|
Do not save machine code. And can run on computers that have a corresponding interpreter.
|
Common issues with compilers :
- Unable to locate the compile - cause when the path of your compiler isn’t proper.
- Syntax Errors: when the syntax isn’t correct.
- Semantic Error: referring to the correctness of the statements, e.g.,:
int name: John
, (wrong), or using a variable without declaring it. These are generally warnings.
Issues with Interpreters
- The main method is not defined: in Java, the interpreter looks for the main class which causes execution of the program, not defining it can cause this error.
- Unable to locate class: this error generally occurs, if you are interpreting the class as “
demo-class.class
” instead of “demo-class
”.