According to the International Telecommunication Union, in 2016, three and a half billion people used the Internet with one or another regularity. Most of them do not even think that any messages sent by them through a PC or mobile gadgets, as well as texts that are displayed on various monitors, are actually combinations of 0 and 1. This representation of information is called encoding. It provides and greatly facilitates the implementation of its storage, processing and transfer. In 1963, the American ASCII encoding was developed, which this article is dedicated to.
Presentation of information on a computer
From the point of view of any electronic computer, the text is a set of individual characters. They include not only letters, including capital letters, but also punctuation marks, numbers. In addition, special characters "=", "&", "(" and spaces are used.
The set of characters that make up the text is called the alphabet, and their number is called power (denoted by N). To determine it, the expression N = 2 ^ b is used, where b is the number of bits or information weight of a particular character.
It is proved that the alphabet with a capacity of 256 characters allows you to represent all the necessary characters.
Since 256 represents the 8th power of two, the weight of each character is 8 bits.
A unit of measurement of 8 bits is called 1 byte, so it is customary to say that the binary code of any character in the text stored on the computer takes up one byte of memory.
How encoding is done
Any texts are entered into the memory of a personal computer using the keyboard keys on which numbers, letters, punctuation marks and other characters are written. They are transferred to the RAM in binary code, i.e., each character is assigned a decimal code that is customary for a person, from 0 to 255, which corresponds to a binary code - from 00000000 to 11111111.
Byte encoding of characters allows the processor that processes the text to access each character separately. At the same time, 256 characters is enough to represent any character information.
ASCII character encoding
This acronym in English stands for American standard code for information interchange.
Even at the dawn of computerization, it became obvious that you can come up with the most diverse ways of encoding information. However, to transfer information from one computer to another, it was necessary to develop a single standard. So, in 1963, the ASCII encoding table appeared in the USA. In it, any symbol of the computer alphabet is assigned its serial number in binary representation. Initially, ASCII encoding was used only in the United States, and then became the international standard for PCs.
Table of contents
ASCII codes are divided into 2 parts. Only the first half of this table is considered an international standard. It includes characters with serial numbers from 0 (encoded as 00000000) to 127 (code 01111111).
Serial number N | ASCII text encoding | Symbol |
0 - 31 | 0000 0000 - 0001 1111 | Characters with N from 0 to 31 are called control characters. Their function is to “guide” the process of outputting text to a monitor or printing device, giving a sound signal, etc. |
32 - 127 | 0010 0000 - 0111 1111 | Characters with N from 32 to 127 (the standard part of the table) are uppercase and lowercase letters of the Latin alphabet, 10th digits, punctuation marks, as well as various brackets, commercial and other characters. Symbol 32 denotes a space. |
128 - 255 | 1000 0000 - 1111 1111 | Characters with N from 128 to 255 (an alternative part of the table or code page) can have different options, each of which has its own number. The code page is used to specify national alphabets that are different from Latin. In particular, it is with its help that ASCII encoding is performed for Russian characters. |
In the encoding table, uppercase and lowercase letters follow each other in alphabetical order, and numbers - in ascending order of values. This principle holds true for the Russian alphabet.
Control characters
The ASCII encoding table was originally created to receive and transmit information on such a long-unused device as a teletype. In this regard, non-printable characters used as commands for controlling this device were included in the character set. Similar commands were used in such pre-computer messaging methods as Morse code, etc.
The most common “teletype” character is NUL (00, “zero”). It is still used in most programming languages to denote the end of line character.
Where to use ASCII encoding
The American standard code is necessary not only for entering text information from the keyboard. It is also used in graphics. In particular, in ASCII Art Maker, images of various extensions represent a spectrum of ASCII characters.
There are two types of similar products: performing the function of graphic editors by converting an image into text and converting “drawings” into ASCII graphics. For example, a well-known emoticon is a prime example of an encoding character.
ASCII can also be used when creating an HTML document. In this case, you can enter a certain set of characters, and when viewing the page, a symbol will appear on the screen that corresponds to this code.
ASCII is also necessary for creating multilingual sites, since characters that are not included in a specific national table are replaced with ASCII codes.
Some features
For encoding ASCII text information, 7 bits were originally used (one remained empty), however today it works as 8-bit.
The letters located in the columns located above and below differ from each other by only one single bit. This greatly reduces the complexity of the check.
Using ASCII in Microsoft Office
If necessary, this type of encoding of text information can be used in Microsoft text editors such as Notepad and Office Word. However, when typing in this case, it will be impossible to use some functions. For example, you cannot highlight in bold, since ASCII encoding saves only the meaning of information, ignoring its general appearance and form.
Standardization
ISO has adopted ISO 8859. This group defines eight-bit encodings for different language groups. In particular, ISO 8859-1 is Extended ASCII, which is a table for the United States and Western European countries. And ISO 8859-5 is a table used for the Cyrillic alphabet, including the Russian language.
For a number of historical reasons, the ISO 8859-5 standard has not been used for very long.
For the Russian language, encodings are really used at the moment:
- CP866 (Code Page 866) or DOS, which is often called an alternative GOST encoding. It was actively used until the mid 90s of the last century. At the moment, it is practically not used.
- KOI-8. The encoding was developed in the 1970s and 80s, and at the moment it is a generally accepted standard for mail messages in Runet. It is widely used in Unix family OSs, including Linux. The “Russian” version of KOI-8 is called KOI-8R. In addition, there are versions for other Cyrillic languages, such as Ukrainian.
- Code Page 1251 (CP 1251, Windows - 1251). Developed by Microsoft to provide support for the Russian language in the Windows environment.
The main advantage of the first CP866 standard was the preservation of pseudographic characters at the same positions as in Extended ASCII. This made it possible to run text programs of foreign production, such as the famous Norton Commander, without changes. At the moment, CP866 is used for programs developed for Windows that work in full-screen text mode or in text windows, including FAR Manager.
Computer texts written in CP866 encoding are quite rare lately, however, it is it that is used for Russian file names in Windows.
Unicode
At the moment, this encoding is most widely used. Unicode codes are divided into areas. The first (from U + 0000 to U + 007F) includes ASCII characters with codes. This is followed by the field of characters of various national scripts, as well as punctuation marks and technical symbols. In addition, some Unicode codes are reserved in case you need to include new characters in the future.
Now you know that in ASCII encoding, each character is represented as a combination of 8 zeros and ones. To non-specialists this information may seem unnecessary and uninteresting, but don’t you want to know what is happening “in the brains” of your PC ?!