Distributed databases

A distributed or, if properly called, distributed database is a database that includes several computers connected by a network, each of which runs a local database. The combination of all these software and hardware creates a common database. Distributed databases from the outside look like ordinary local databases, their hardware diversity is not visible to users. A distributed control system monitors all database nodes and ensures data connectivity.

Christopher Date, a well-known world-famous database specialist, has identified twelve main properties that all distributed databases should have: 1) local autonomy, 2) continuous operations, 3) node independence, 4) transparent fragmentation, 5) location transparency, 6) processing certain requests, 7) transparent replication, 8) independence from equipment, 9) processing of distributed transactions, 10) transparency of the network, 11) independence from the operating system, 12) independence from the selected databases.

Consider the main qualities that, according to Data, should have all distributed databases, in more detail.

Local autonomy means that each node independently manages the data of its database.

Continuous operations. At this point, C. Date says that access to data should be provided continuously and regardless of which node it is located on. It should also not matter what operations the local database is currently performing.

Node independence. In an ideal system, all nodes are equal and are not dependent on each other. Each database located on a node delivers data to a common space with the same rights. All databases that make up a distributed database are self-sufficient and protected from access by outsiders.

Transparent fragmentation. This property requires internal databases to support the distributed allocation of data that in fact is a single entity.

Location transparency. A user accessing distributed databases does not need to know anything about which nodes the information he needs is physically located.

Processing distributed requests. The database must execute distributed SQL queries.

Transparent replication. In general, replication is the transfer of changed objects from one database to another. In the context of this material, we mean the transfer of data between nodes in ways that ensure that these actions are invisible to the user.

Independence from equipment means that any computer models can act as nodal stations of a distributed database network.

Distributed transaction processing is interpreted as a way to update a distributed database using the UPDATE, DELETE, and INSERT commands, during the execution of which the integrity and consistency of information stored in the database is not lost.

Independence from the OS implies that the nodes of the system can operate under the control of any operating system.

Network transparency means that access to all elements of a distributed database requires only a network connection.

Database independence . This important property requires the system to be able to work with all distributed DBMSs from different manufacturers, including the ability to search and update.

As we can see, the definition of C. Data for a distributed database describes it as a structure with weak connections, consisting of independent nodes, which are local databases. These LBDs are autonomous, and distributed DBMS from different manufacturers provide access to them. The nodes form relationships among themselves, which are replicated data. The distributed database topology is formed by the geography of the information system and data replication flows.

Source: https://habr.com/ru/post/A10682/


All Articles