🙅🏾 🚛 🎚️ A translator is ... Types of translators. Convert and broadcast programs 🖊️ 📞 🧑🏻‍🤝‍🧑🏻

Programs, like people, require a translator or translator to translate from one language to another.

Basic concepts

The program is a linguistic representation of the calculations: i → P → P (i). The interpreter is a program, the input of which is program P and some input data x. It performs P on x: I (P, x) = P (x). The fact that there is a single translator capable of executing all possible programs (which can be represented in a formal system) is a very deep and significant discovery of Turing.

The processor is an interpreter of machine language programs. As a rule, interpreters for high-level languages are too expensive to write, so they are translated into a form that is easier to interpret.

Some types of translators have very strange names:

Assembler translates assembler programs into machine language.
The compiler translates from a high level language to a lower language.

A translator is a program that takes a program in some S language as input and outputs a program in T in such a way that both of them have the same semantics: P → X → Q. That is, ∀x. P (x) = Q (x).

If you translate the entire program into something interpreted, then this is called compilation before execution, or AOT compilation. AOT compilers can be used sequentially, the last of which is often assembler, for example:

Source code → Compiler (translator) → Assembler code → Assembler (translator) → Machine code → CPU (interpreter).

Operational or dynamic compilation occurs if part of the program is broadcast when other previously compiled parts are executed. JIT translators remember what they have already executed so as not to repeat the source code again and again. They are even capable of adaptively compiling and recompiling based on the behavior of the program runtime.

Many languages allow you to execute code during translation and compile new code during program execution.

Broadcast Stages

Broadcasting consists of the stages of analysis and synthesis:

Source code → Analyzer → Conceptual representation → Generator (synthesizer) → Target code.

This is due to such reasons:

Any other way is not suitable. Word by word translation just doesn't work.
A good engineering solution: if you need to write translators for M source languages and N target, you only need to write M + N simple programs (semi-compilers), and not M × N complex (full translators).

However, in practice, conceptual representation is very rarely expressive and powerful enough to encompass all conceivable source and target languages. Although some were able to get close to this.

Real compilers go through many stages. When creating your own compiler, you don’t need to repeat all the hard work that people have already done when creating views and generators. You can translate your language directly into JavaScript or C and use the existing JavaScript engines and C compilers to do the rest. You can also use existing intermediate views and virtual machines.

Translator Record

A translator is a program or technical tool in which three languages are involved: source, target and basic. They can be written in a T-form by placing the source on the left, the target on the right and the base below.

There are three types of compilers:

A translator is a self-compiler if its source language matches the base one.
A compiler whose target language is equal to the base language is called self-resident.
A translator is a cross-compiler if its target and base languages are different.

Why is it important?

Even if you never make a real compiler, it's good to know about the technology of its creation, because the concepts used for this are used everywhere, for example in:

text formatting;
database query languages ;
advanced computer architectures;
generalized optimization problems ;
graphical interfaces;
scripting languages;
controllers
virtual machines;
machine translations.

In addition, if you need to write preprocessors, assemblers, loaders, debuggers, or profilers, you must go through the same steps as when writing the compiler.

You can also learn how to write programs better, since creating a translator for a language means a better understanding of its subtleties and ambiguities. Learning the general principles of translation also allows you to become a good language designer. Is it so important how cool the language is if it cannot be implemented effectively?

Comprehensive technology

Compiler technology covers many different areas of computer science:

formal language theory: grammar, parsing, computability;
computer architecture: instruction sets, RISC or CISC, pipelining, cores, clock cycles, etc .;
programming language concepts: for example, sequence control, conditional execution, iteration, recursion, functional decomposition, modularity, synchronization, metaprogramming, scope, constants, subtypes, templates, output type, prototypes, annotations, flows, monads, mailboxes, continuations , wildcards, regular expressions, transactional memory, inheritance, polymorphism, parameter modes, etc .;
abstract languages and virtual machines;
algorithms and data structures: regular expressions, parsing algorithms, graphical algorithms, dynamic programming, training;
programming languages: syntax, semantics (static and dynamic), support for paradigms (structural, OOP, functional, logical, stack, parallelism, metaprogramming);
software creation (compilers, as a rule, large and complex): localization, caching, componentization, APIs, reuse, synchronization.

Compiler design

Some problems that arise when developing a real translator:

Problems with the source language. Is it easy to compile? Is there a preprocessor? How are types handled? Are there libraries?
Compiler pass grouping: single or multi-pass?
The degree of optimization desired. Fast and unclean program translation with almost no optimization can be normal. Excessive optimization will slow down the compiler, but better code at runtime might be worth it.
The required degree of error detection. Can the translator just stop at the first error? When should he stop? Should I entrust the compiler with bug fixes?
Availability of tools. If the source language is not very small, a scanner and analyzer generator are required. There are also code generator generators, but they are not so common.
The kind of target code to generate. Choose from clean, augmented, or virtual machine code. Or simply write an input that creates popular intermediate views such as LLVM, RTL, or JVM. Or do a translation from source to source code in C or JavaScript.
Target code format. You can choose assembly language, portable machine code, machine code for a memory image.
Retargeting. With many generators, it’s good to have a common input. For the same reason, it is better to have one generator for many input parts.

Compiler Architecture: Components

These are the main functional components of the translator generating machine code (if the output program is a C program or a virtual machine, then not many steps will be required):

The input program (stream of characters) enters the scanner (lexical analyzer), which converts it into a stream of tokens.
The parser (parser) builds an abstract syntax tree from them.
The semantic analyzer decomposes the semantic information and checks the tree nodes for errors. As a result, a semantic graph is constructed - an abstract syntax tree with additional properties and established links.
The intermediate code generator builds a flow graph (tuples are grouped into main blocks).
The machine-independent code optimizer performs both local (within the base unit) and global (for all blocks) optimization, mainly remaining within the framework of subprograms. Reduces redundant code and simplifies calculations. The result is a modified flow graph.
The generator of the target code connects the base blocks into straightforward code with the transfer of control, creating an object file in assembler with virtual registers (possibly inefficient).
The machine-dependent optimizer-layout distributes memory between the registers and scheduling commands. Converts an assembler program into a real assembler with good use of pipelining.

In addition, error detection subsystems and a character table manager are used.

Lexical analysis (scan)

The scanner converts the stream of characters of the source code into the stream of tokens, removing spaces, comments and expanding macros.

Scanners are often confronted with issues such as accepting or disregarding case, indentation, line feeds, and nested comments.

Errors that may occur during scanning are called lexical and include:

characters that are not in the alphabet;
excess of the number of characters in a word or line;
not a closed character or string literal;
end of file in comment.

Parsing

The parser converts the sequence of tokens into an abstract syntax tree. Each tree node is saved as an object with named fields, many of which are tree nodes themselves. There are no cycles at this stage. When creating a parser, you need to pay attention to the level of grammar complexity (LL or LR) and find out if there are any rules for disambiguation. Some languages do require semantic analysis.

Errors encountered at this stage are called syntactic. For instance:

k = 5 * (7 - y;
j = / 5;
56 = x * 4.

Semantic analysis

During the semantic analysis, it is necessary to check the rules of admissibility and connect parts of the syntax tree (resolving name references, inserting operations for implicit type conversion, etc.) to form a semantic graph.

Obviously, the set of admissibility rules for different languages is different. If Java-like languages are compiled, translators may find:

multiple declarations of a variable within its scope;
references to a variable before its declaration;
links to an undeclared name;
violation of accessibility rules;
too many or insufficient arguments when calling the method;
type mismatch.

Generation

Intermediate code generation produces a flow graph composed of tuples grouped into base blocks.

Code generation produces real machine code. In traditional compilers for RISC machines, the first step is to create an assembler with an infinite number of virtual registers. For CISC machines, this probably won't happen.

A translator is ... Types of translators. Convert and broadcast programs