🧥 👌🏾 🎌 Machine code as a programming language. Assembly language 🐠 🛢️ 👨🏿‍✈️

An assembly language (or assembler) is a low-level programming language for a computer or other programmable equipment in which there is a correlation between the language and the instruction for machine architecture code. Each machine-oriented language (in professional terminology - “collector”) refers to a specific computer architecture. In contrast, most high-level programming languages are cross-platform, but require interpretation or compilation.

Platform-oriented code can also be called a symbolic language or a set of instructions executed directly by the computer’s central processor. Each program executed by the processor consists of a series of instructions. Machine code is by definition the lowest level of programming visible to a programmer.

Using

Many operations require one or more operands that can build a complete instruction, and many assemblers can accept expressions of numbers and constants, as well as registers and labels, as operands. This frees the specialist from programming in machine language from tedious repetitive calculations. Depending on the architecture, these elements can also be combined for specific instructions or addressing modes using offsets or other data, as well as fixed addresses. Many “builders” offer additional mechanisms to facilitate program development, control the build process, and support debugging.

Historical perspective

The first assembly language was developed in 1947 by Kathleen Booth for ARC2 at Birkbeck University of London while working with John von Neumann and Herman Goldsteen at the Institute for Advanced Study. SOAP (Symbolic Optimal Assembly Program) was the assembly language for the IBM 650 PC , created by Stan Powley in 1955.

Historically, many software solutions were written only in assembly language. OSs were written exclusively in this language before the introduction of Burroughs MCP (1961), which was written in the language of Executive Systems Problem Oriented Language (ESPOL). Many commercial applications were written in a machine-oriented language, including a large number of IBM mainframe software created by IT giants. COBOL and FORTRAN eventually crowded out most of the work, although many large organizations retained assembler application infrastructures in the 1990s.

Most early microcomputers were based on manual coding assembly language , including most OSs and large-scale applications. This is due to the fact that these machines had severe resource limitations, loaded individual memory and display architecture and provided limited system services with errors. Perhaps more important was the lack of first-class high-level language compilers suitable for use in a microcomputer, which made learning machine code difficult.

Application area

Assembly languages eliminate most of the problematic, tedious, and time-consuming first-generation assembler programming required on the earliest computers. This frees programmers from the routine of remembering numerical codes and calculating addresses. In the initial stages, “builders” were widely used for all types of programming. However, by the end of the 1980s. their use has been largely supplanted by higher-level languages in search of improved programming performance. Today, assembly language is still used for direct hardware manipulation, access to specialized processor instructions, or to solve critical performance problems. Typical applications include device drivers, low-level embedded systems, and real-time settings.

Application examples

Typical examples of large assembly language programs are the IBM PC DOS operating systems, Turbo Pascal compiler, and earlier applications such as the Lotus 1-2-3 spreadsheet program.

Machine-oriented language is the main development language for many popular home PCs of the 1980s and 1990s (such as MSX, Sinclair ZX Spectrum, Commodore 64, Commodore Amiga and Atari ST). This is due to the fact that the interpreted BASIC dialogs on these systems provided low execution speed, as well as limited opportunities for the full use of existing equipment. Some systems even have an integrated development environment (IDE) with highly developed debugging tools and macro objects. Some compilers available for Radio Shack TRS-80 and its successors had the ability to combine the built-in build source with high-level programs. After compilation, the inline assembler created the inline binary.

Machine code for dummies. Terminology

The assembler program creates operation codes by translating combinations of mnemonics and syntax rules for operations and addressing modes into their numerical equivalents. This representation typically includes an operation code, as well as other control bits and data. The assembler also calculates constant expressions and defines symbolic names for memory locations and other objects.

Assembler instruction machine codes can also perform some simple types of optimization, depending on the instruction set. One concrete example of this could be the popular x86 "builders" from different vendors. Most of them can replace transition instructions in any number of passes, upon request. They are also able to perform simple rearrangement or insertion of instructions, such as some compilers for RISC architectures, which can help optimize smart command scheduling to maximize CPU utilization.

Like early programming languages such as Fortran, Algol, Cobol, and Lisp, compilers have been available since the 1950s, as were the first generations of text-based computer interfaces. However, collectors first appeared, since they are much easier to write than compilers for higher-level languages. This is due to the fact that each mnemonic, as well as addressing modes and operands of instructions are translated into numerical representations of each specific instruction without a lot of context or analysis. There were also a number of classes of translators and semi-automatic code generators with properties similar to both assemblies and high-level languages, and high-speed code is perhaps one of the most famous examples.

Number of passes

There are two types of assembler programming based on the number of passes through the source (the number of read attempts) to create an object file.

Single-pass assemblers go through the source code once. Any character used before it is defined will require errata at the end of the object code.
Multipass assemblers create tables with all characters and their values in the first passes, and then apply the table in subsequent passes to generate the code.

The initial reason for using single-pass collectors was the build speed - often the second pass required rewinding and re-reading the program source on tape. Later computers with much larger amounts of memory (especially for storing disks) had the space to do all the necessary processing without having to read it again. The advantage of multi-pass assembler is that the absence of errors leads to the fact that the linking process (or loading a program if the assembler directly creates executable code) is faster.

What is binary code?

The program, written in assembly language, consists of a series of mnemonic processor instructions and meta-operators (known as directives, pseudo-instructions, and pseudo-operations), comments, and data. Assembly language instructions usually consist of operation code mnemonics. It is followed by a list of data, arguments, or parameters. They are translated by assembler into machine language instructions that are loaded into memory and executed.

For example, the instruction below tells the x86 / IA-32 processor to move an 8-bit value into a register. The binary code for this command is 10110, followed by a 3-bit identifier for which case is used. AL is 000, so the following code loads the AL register with data 01100001.

The question is: what is binary code? This is a coding system using the binary digits “0” and “1” to represent a letter, number or other symbol on a computer or other electronic device.

Example machine code: 10110000 01100001.

Technical features

Converting an assembly language into machine code is an assembler job. The reverse process is performed using a disassembler. Unlike high-level languages, there is a one-to-one correspondence between many simple assembly statements and machine language instructions. However, in some cases, the assembler may provide pseudo-instructions (macros). They extend to several machine language instructions to provide commonly needed functionality. Most full-featured assemblers also provide a rich macro language, which is used by vendors and programmers to generate more complex codes and data sequences.

Each computer architecture has its own machine language. Computers differ in the number and types of operations that they support, in different sizes and the number of registers, as well as in the data representations in the repository. While most general-purpose PCs are able to perform almost the same functionality, the ways in which they do this are different. The corresponding assembly languages reflect these differences.

Many sets of mnemonics or syntax in assembler can exist for one set of commands, usually created in different programs. In these cases, the most popular is usually the one provided by the manufacturer and used in its documentation.

Design language

There is a great deal of diversity in how collector authors classify the statements and nomenclature that they use. In particular, some describe everything that differs from machine or extended mnemonics as pseudo-operation. The basic assembly dictionary consists of a system of commands - three main varieties of instructions that are used to determine program operations:

opcode mnemonics;
data definitions;
collector directives.

Opcode mnemonics and extended mnemonics

Instructions written in assembly language are elementary, unlike higher-level languages. As a rule, mnemonics (arbitrary characters) is a symbolic designation for one executable code instruction. Each instruction usually consists of an opcode plus zero or more operands. Most commands refer to one or two values.

Extended mnemonics are often used for the specialized operation of instructions - for purposes not obvious from the name of the manual. For example, many processors do not have an explicit NOP instruction, but have built-in algorithms that are used for this purpose.

Many collectors support elementary built-in macros capable of generating two or more machine instructions.

Data directives

There are instructions used to define elements for storing data and variables. They determine the data type, length and alignment. These instructions can also determine the availability of information for external programs (collected separately) or only for the program in which the data section is defined. Some assemblers define them as pseudo-operators.

Assembly directives

Collector directives, also called pseudo-codes or pseudo-operations, are commands provided to the assembler and directing them to perform operations other than assembly instructions. Directives affect the operation of assembler and can affect object code, symbol table, listing file and parameter values of internal assembler. Sometimes the term pseudocode is reserved for directives that generate object code.

The names of pseudo-operations often begin with a period to differ from machine instructions. Another common use of pseudo-operations is to reserve storage areas for runtime data and possibly initialize their contents to known values.

Self-documenting code

Character assemblers allow programmers to associate arbitrary names (labels or characters) with memory cells and different constants. Often each constant value and variable is assigned its own name, so instructions can refer to these locations by name, thereby contributing to self-documenting code. In executable code, the name of any subprogram is associated with its entry point, so any subprogram call can use its name. Inside routines, GOTO labels are assigned. Many collectors support local characters, which are lexically different from regular characters.

Assemblers such as NASM provide flexible character management, allowing programmers to manage different namespaces, automatically calculate offsets in data structures, and assign labels that refer to literal values or the result of simple calculations performed by the assembler. Shortcuts can also be used to initialize constants and variables using roaming addresses.

Assembly languages, like most other computer languages, allow you to add comments to the source code of a program that will be ignored during the build process. Forensic commenting is important in assembly language programs, since it is difficult to determine and assign a sequence of binary machine instructions. The "unprocessed" (without commenting) assembly language created by compilers or disassemblers is quite difficult to read when changes are needed.

Machine code as a programming language. Assembly language