Many have repeatedly heard about the use of CDN. What it is? The decoding of this abbreviation is translated from English as a network of content delivery to the user, distributed regionally.
What are the trivial advantages it brings? Of the indisputable advantages - unloading the source of content and saving the cache in the geographic location closest to the consumer. Many may wonder why this is necessary. The answer is simple - so that the buyer receives personal content with the shortest delay. This network is used for both simple http traffic and streaming. In the second case, the user receives a stable image, and the media stream less often flies and plays more often in higher quality (with multi-bitrate).
If the CDN is third-party, then the best option would be to send it only the maximum load, thereby maintaining a minimally comfortable channel width (channels are expensive), and providing savings on the final equipment and its maintenance.
CDN: what is it and how does it work?
The most pressing question is how is it arranged? In fact, a monosyllable answer is impossible. As an answer, there are several different options.
So what is a CDN? You can start with a more familiar option (maximum savings). A network is a collection of large providers holding their own DCs (for example, Megafon, the Central Telegraph and the like, including regional companies). There is no backbone as such; everything goes through the same channels with subscriber and client traffic.
The relationship with providers in this case is extremely weak. In these examples, as a rule, you can’t do without your equipment, because everything depends on the disk subsystem, and it (despite the statements of many adherents of specialized "steel" companies) is virtualized extremely poorly. You can often hear that valuable IOPS are lost during the virtualization process. SSDs are practically not used at the same time, since it is very expensive.
CDN services (Jquery and others), as a rule, the servers themselves are "universal". They are used for streaming broadcasting and for web caches, stream servers for flv and mp4 files. Well-known DNSs are used on similar servers. Balancing is carried out only by DNS-view methods by region / provider and so on. Image CDNs are also widely known for facilitating the transfer of large image files.
In accordance with the foregoing, the quality of service is mediocre. Such a CDN is not always possible to use for distribution (caching) of mp4 and flv data or voluminous files. Delays in the transmission of information in this case vary greatly, up to large time intervals. It follows that for streaming this network is not suitable, as well as for instant web traffic. Such a CDN site cannot be accelerated significantly.
Higher level
More powerful CDNs (mostly non-Russian ones - Akamai, L3, CDNetworks) usually do not save on their own infrastructure due to the fact that they understand the prospects of such investments. They have everything arranged differently. So, they have their own network (backbone network), which serves both internal and official traffic. In addition, they have their own AS (autonomous systems). They also have routing issues in their hands. Peer-to-peer relationships with Internet providers are also well established.
Balancing here is built on the principle of anycast + DNS + LVS. From the network architecture and the above routing, the likelihood of balancing requests from the consumer in more advanced ways also follows. This is done not only by view-DNS, but also by anycast. On any IP address, a balancer is fixed, which allows you to send requests to various servers.
Of course, there is no talk of any “universal” nodes, as well as virtualization of absolutely all services. There are servers that upload content, as well as for the distribution of instant content. There are also intermediate places for storing large amounts of data that require streaming and distributing components.
In addition, there are servers: source, intermediate and terminal multiplexers, to which the client publishes the stream. If the output requires hls, hds or sliverlight-streaming, the terminal servers are usually considered web caches for very high-quality and quickly loaded content.
Such an architecture will allow the service to withstand huge loads without the risk of delays for customers and customers. In the case of a private CDN, it is more rational to use the capabilities of the equipment at maximum loads, while ensuring an adequate level of service (spread of delays, stalls, etc.).
Which servers find their application?
From a technical point of view, such services use nginx web caches, because the server has everything necessary for proxying requests and caching. You can write your own modules to it, including for downloading the necessary content into the cache, “cleaning” certain volumes of information in it, collecting statistical data (and, for example, sending it to the mongo database). A service from the manufacturer is also usually provided. So, L3 created for itself its own nginx (its own CDNJS web server).
Streaming servers are often something of their own (usually based on off-the-shelf models like red5 or something similar) or Wowza Media Server. The servers where the customer publishes the streams are usually Adobe FMS. As a rule, these include Game CDN.
Storage servers can be both object stores such as mogilefs, hadoop, and very large FSs like Luster or Gluster, which are now gaining popularity. Swift OpenStack repositories (Files CDN) are also common, although they have not yet been finalized and have not received widespread approval due to some kind of “dampness”.
Transcoders are a classic version of ffmpeg with a large self-written strapping (tracking software, task sequence manager, etc.)
Statistical data
Much depends on the methods of tariff setting and billing schemes. But there are moments that can not be circumvented. Accounting statistics using netflow is basically impossible, because the volume of traffic is large, and it is irrational to allocate a whole cost item for such an amount of equipment for computing and parallelizing the process. Statistics produced by the logs. Starting with end nodes, when repeating requests are collapsed (per 1 CDN URL with 1 IP or subnet), then aggregated logs are threshed on special servers, there they display statistics for technical needs and billing.
Statistics in more detail
How does statistics work in CDN? What is this in detail? It includes the following components:
- there is usually the opportunity to create a timeline for the number of requests per unit of time, the number of clients (used in streaming), the number of errors per unit of time (for example, the number of breaks for streams or the number of errors 404, 500 and 502 for http servers) ;
- layout of graphs according to geo-statistics;
- a caching or multiplexing factor (in streaming) at some point in time;
- for internal use, they usually use the collection of statistics on response time for responses that are not limited in speed, for fronts, intermediate servers, and temporary statistics of sources.
Native API for interacting with CDN is a necessary mechanism - the service itself cannot exist without it. Often, it can be used to clean the entire cache or certain objects, configure or initiate downloading a file from a source for preliminary caching it in a CDN on nodes. An example is the CDN SteamCommunity, which runs a worldwide gaming network.
Compressed overview of the most popular CDN providers
It is useful for each advanced user to find out about several of the most popular content delivery network services (Jquery CDN and the like). Some of them are widely used, while others are at the stage of growth and development.
CloudFlare Network
Today it is the most famous and widespread CDN URL service. It is possible to purchase a paid tariff package on the CloudFlare network or use the free tariff. The company has been operating on the market for more than a decade and a half and has earned an impeccable reputation for itself during this time. One of the key advantages of the service - CloudFlare does not set a specific bandwidth, like competing companies.
MaxCDN Network
Also one of the most popular CDN services owned by NetDNA (Distributed Delivery Leader). The key advantage of MaxCDN is that the service is easily integrated with the most common content management systems (WP, Joomla, Drupal, Magento, etc.). In this network (Frigate CDN) the test version is provided for free for a week, there is no free tariff yet. However, the cost of use is quite affordable.
TinyCDN Network
According to user reviews, one of the best services. It is based on the Amazon Web Services database (one of the most famous companies in this field), therefore it is one of the most reliable. The price for using it is not much higher than that of competing companies. TinyCDN has a free version for testing, which provides an opportunity to use the service for 30 days.
Google page speed
The Google Page Speed ​​Webmaster Network is not as well known as its target audience is developers. Its development is taking leaps and bounds, as well as other products from Google. If you want your own experiments in work, be sure to try this service. It can be successfully used in a wide variety of networks, and reviews about it are mostly positive.