NoSQL Databases: Overview, Examples, and Applications

NoSQL is a repository that does not match the model of relational databases and their characteristics, they do not have schemas, they are not combined, and they do not guarantee the ACID property. The NO system is scaled horizontally and uses a wide amount of the main memory of the computer, solving the problem of large amounts of information.

Proprietary proprietary types are a new NoSQL non-relational database development methodology developed by large companies to meet corporate needs, such as Google's BigTable, which is considered the first NoSQL system, and Amazon DynamoDB. The success of these systems marked the beginning of the development of a number of similar open source database systems and proprietary databases, the most popular of which are Hypertable, Cassandra, MongoDB, DynamoDB,

NoSQL Evolution

The scalability problem of SQL has been recognized by Web 2.0 companies with huge, growing data and infrastructure needs such as Google, Amazon, and Facebook. They found their own solutions to problems by introducing BigTable, DynamoDB and Cassandra technologies. Growing interest has led to the emergence of a number of NoSQL database management systems (DBMSs) with an emphasis on performance, reliability, and consistency. A number of existing indexing structures have been reused and improved to improve search and read performance.

The term was coined by Calor Strozzi back in 1998, and resurrected in 2009 by Rackspace employee Eric Evans to solve the problems of web companies with a large volume of operations and information.

The term was coined by Calor Strozzi

One key difference between NoSQL databases and traditional relational databases is that the former is a form of unstructured storage.

Unstructured storage

Therefore, NoSQL does not have a fixed table structure, as in a relational system. This table provides a brief comparison of NoSQL and SQL features.

Brief feature comparison

It should be noted that the table shows a comparison at the database level, not the DBMS that implements both models. These systems provide their own proprietary methods to overcome some of the problems and shortcomings of both systems, which greatly improves performance and reliability.

Types of Information Storage

NoSQL Key Value uses a hash table

The NoSQL Key-Value database type uses a hash table in which a unique key points to an element. They can be organized into logical groups, requiring uniqueness within their limits. This allows the use of identical keys in different logical groups. Some database implementations provide caching mechanisms that significantly increase their performance.

All that is needed to work with objects stored in the database is the key. Data is stored as a JSON or BLOB string (blob). One of the biggest drawbacks of this form is the lack of consistency at the database level. This can be added during the development of the NoSQL database by programmers with their own code, but it also requires more effort, due to the complexity of implementation and time. The most famous NoSQL database built on a key value store is Amazon DynamoDB.

Document storages are similar to key value storages in that they contain no schema and are based on a value model. Therefore, both types have the same advantages and disadvantages. Both of them lack coherence at the database level, which prevents applications from providing more reliable features. However, there are some key differences between the two. In document repositories, values ​​(documents) provide an encoding for the stored data. Such encodings can be XML, JSON or BSON (binary JSON). The most popular database application using document storage is MongoDB.

In the Column Family database, data is stored in columns, not in rows, as is done in most relational database management systems. A column repository consists of one or more families of columns that logically group specific columns in a database. The key is used to identify and indicate the number of columns with a keyspace attribute that defines its scope. Each column contains tuples of names and values, ordered and separated by commas.

Column storages have quick read / write access to stored data. In it, the columns of a row correspond to one column and are stored as one record on disk. This provides faster access during read / write operations. The most popular databases that use NoSQL database column storage are examples: Google BigTable, HBase, and Cassandra.

The NoSQL Graph Bd database uses a directed graph structure to represent data. The graph consists of edges and nodes.

DB operation principle

Types of Information Storage

NoSQL work like a file in which all data is stored, they allow you to work with a huge amount of information and organize it so that users can access it at any time when they need it. Currently, there are different types of NoSQL, each of them works differently, most are written in C ++. We can say that NoSQL databases center their functions based on:

  1. Horizontal scalability with the ability to increase its size, increase storage space in the database without compromising performance.
  2. Cloud technology. Most NoSQL databases base their storage in the cloud to free up more space. In addition, they have nodes for replicating information.
  3. Efficient use of resources. Companies are currently in the process of technological transition, so it is practically necessary that they have a database that allows them to introduce new technological tools. NoSQL data works just for this - a flexible model allows you to quickly adapt to new tools.
  4. Free functioning scheme. NoSQL does not have a rigid system, so programmers have the freedom to change data as needed. This means that if you want to change the definition of a field or data type, then this is not a problem, unlike SQL databases, where changes of this kind are associated with great difficulties.
  5. Response speed. The speed in the database is measured by the delay, which is the response time, NoSQL is worried about reducing the delay time as much as possible.
  6. Using indexes. SQL and NoSQL need indexes, because queries cannot be made in millions of records if the index has not been configured. In NoSQL, indexes are generated in the form of a B-Tree, that is, the nodes are balanced, which means that the search speed increases.

Control systems

The following table provides a brief comparison between various NoSQL database management systems.

Control systems

MongoDB has a flexible schema repository - this means that stored objects do not have to have the same structure or fields. It also has some optimization functions that distribute data collections among themselves, which leads to an overall improvement in performance and a more balanced system. Other NoSQL systems, such as Apache CouchDB, are also document storage type databases and have many common features with MongoDB, except that the database can be accessed using the RESTful API.

REST is an architectural style that consists of a coordinated set of architectural constraints that apply to components, connectors, and data elements on the Internet. It is based on a cached client-server communication protocol without state preservation, for example, the HTTP protocol. RESTful applications use HTTP requests to publish, read, and delete data. Regarding column databases, Hypertable is a NoSQL database written in C ++ and based on Google BigTable. Hypertable supports hosted data warehouse distribution for maximum scalability like MongoDB and CouchDB.

Cassandra Hybrid System

Cassandra hybrid system

One of the most widely used NoSQL databases is Cassandra, developed by Facebook. The goal of Cassandra was to create a DBMS that does not have a single point of failure and provides maximum availability. Cassandra is mainly a column storage database. In some studies, it was referred to as a hybrid system based on Google BigTable, which is a column storage database and Amazon DynamoDB, inherent to the key-value type. The keys in Cassandra point to a set of column families based on the Google BigTable distributed file system and Dynamo accessibility features (distributed hash table).

Key features of Cassandra include:

  1. The lack of a single point of failure. To do this, it must work on a cluster of nodes, and not on a single machine. This does not mean that the data on each cluster is the same. When a failure occurs in one of the nodes, the data on it will be inaccessible. However, other nodes and data will still be available.
  2. Distributed hashing is a scheme that provides the hash table functionality in such a way that adding or removing a single slot does not significantly change the mapping of keys to slots. This allows you to distribute the load on the servers or nodes in accordance with their capacity and minimize downtime.
  3. Relatively easy to use client interface. It uses Apache Thrift for its client interface, which provides an RPC client in several languages, but most developers prefer open-source alternatives based on Apple Thrift, such as Hector.
  4. Data replication. In fact, it reflects data for other nodes in the cluster. Replication can be random or specific for maximum data protection, for example, by hosting another data center in a site.
  5. The separation policy decides where and on which node to place the key. This may be a random or ordered process. With both types of separation policies, Cassandra can strike a balance between workload and query performance optimization.
  6. Coherence. Replication complicates consistency. This is due to the fact that all nodes must be updated at any time with the latest values ​​or during the start of a read operation.
  7. Read / write actions. The client sends a request to one node. The node, according to the replication policy, stores data in the cluster. Each node first changes the data in the commit log and updates the table structure, both changes being performed synchronously. A read request is sent to a single node that contains data in accordance with a split / allocation policy.

Indexing Structures

Indexing structures

Indexing is the process of associating a key with the location of the corresponding data record in the DBMS. There are many data indexing structures used in NoSQL databases. B-Tree is one of the most common index structures in a DBMS. In it, internal nodes can have a variable number of child nodes in a predefined range.

One of the main differences from other tree structures, such as AVL, is that the B-Tree allows you to have a variable number of child nodes, which means less tree balancing but more space loss. B + Tree is one of the most popular B-tree options. This improvement (unlike B-Tree) requires all keys to be in leaves.

The T-Trees data structure was developed by combining the features of AVL-Trees and B-Trees. AVL trees are a type of self-balancing binary search trees, while B trees are unbalanced, and each node may have a different number of children.

In a T-tree, the structure is very similar to an AVL tree and a B-tree. Each node stores more than one tuple {key-value, pointer}. In addition, binary search is used in conjunction with nodes and multiple tuples to provide better memory and performance.

A T-tree has three types of nodes: with a right and left child node, an end node without child nodes, and a half-leaf node with only one child node. T-Trees is believed to have the best overall performance.

Common database usage errors

There are three common mistakes that organizations make when it comes to NoSQL:

  1. NoSQL is more than scalability; you cannot equate NoSQL with a web scale. The ancestors of modern non-relational databases were companies such as Google and Amazon, which focused on solving scalability problems in the web environment.
  2. Developers need to evolve. In one high-profile web project, a poorly selected integration team created a huge problem, and it took time and millions of dollars to fix it.
  3. Complicated distribution. Nothing can replace knowledge and experience in either the implementation or the administration process. It happens that a query that runs quickly on a local development machine will not scale horizontally on hundreds of machines. A modern application has a distributed architecture and many users at the same time, which require quick responses.

Benefits of NoSQL

NoSQL and SQL databases compete with each other, but, according to many experts, the first has more advantages compared to traditional relational databases:

  1. They have a simple and flexible structure.
  2. It has no circuits.
  3. Based on key-value pairs.
  4. Some types include storage for columns, documents, key values, graphs, objects, XML, and other data modes.
  5. Usually each value in the database has a key. Some repositories allow developers to store serialized objects, not just simple string values.
  6. OpenSQL NoSQL does not require expensive license fees and can run on low-cost hardware, making their deployment cost-effective.
  7. When working with NoSQL, regardless of whether they are open or proprietary, the extension is easier and cheaper than when working with relational databases. It is performed by horizontal scaling and load balancing across all nodes, and not by the type of vertical scaling, which is usually done in relational database systems and replaces the main host with a more powerful one.

Disadvantages of a No-System

NoSQL databases work in different ways, it all depends on the documents that are stored in them, but we can say that they are an important tool in modern companies, because they store the necessary user information and operations.

They are not perfect, so they are not always the right choice for programmers. On the one hand, most of them do not support the reliability functions that are initially supported by relational database systems. These reliability characteristics can be summarized as atomicity, consistency, isolation, and durability. This means that NoSQL, which does not support these features, provides trade consistency for performance and scalability.

To support reliability and consistency, developers must implement their own proprietary code that increases system complexity. This limits the number of applications that can rely on NoSQL for secure and reliable transactions, such as banking systems.

Using NoSQL Database

Using NoSQL Database

Academics, engineers, software architects, application designers, and programmers require a deeper knowledge of data structures that were not previously required for relational databases. Market leaders are Hadoop and MongoDB, followed by Kassandra, Radish, CouchDB and Riak. Modern research shows that there are two NOSQL products that dominate system engineers, software architects, and developers among a dozen of similar technologies - MongoDB and Hadoop.

The market shows that large companies use new NoSQL database development methodologies and integrate them into their products (Oracle, IBM). The database market is gradually turning into the standard of PasS, Redis and MongoDB, Edlich. Products such as Neo4j, MongoDb, and CouchDb have been the target of venture capital support and investment.

Source: https://habr.com/ru/post/A13573/


All Articles