Multidimensional scaling: definition, goals, objectives and example

Multidimensional Scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. It refers to a set of related ordination methods used in the visualization of information, in particular, to display information contained in a distance matrix. This is a form of nonlinear dimensional reduction. The MDS algorithm is aimed at placing each object in N-dimensional space so that the distances between the objects are preserved as best as possible. Then, each object is assigned coordinates in each of the N dimensions.

The number of measurements on the MDS graph can exceed 2 and is indicated a priori. The choice of N = 2 optimizes the arrangement of objects for a two-dimensional scattering diagram. You can see examples of multidimensional scaling in the pictures in the article. Particularly indicative are examples with designations in Russian.

Multidimensional scaling

Essence

The method of multidimensional scaling (MMS, MDS) is an extended set of classical tools that generalizes the optimization procedure for many loss functions and input matrices of known distances with weights and so on. In this context, a useful loss function is called stress, which is often minimized by a procedure called stress majorization.

Guide

There are several options for multidimensional scaling. MDS programs automatically minimize the load to get a solution. The core of the non-metric MDS algorithm is a twofold optimization process. First, the optimal monotonic proximity transformation must be found. Secondly, the configuration points should be optimally located so that their distances as close as possible correspond to the scaled proximity values.

Multidimensional Scaling Example

Expansion

An extension of metric multidimensional scaling in statistics in which the target space is an arbitrary smooth non-Euclidean space. In cases where the differences are distances on the surface and the target space is another surface. Thematic programs allow you to find an attachment with minimal distortion of one surface to another.

Stages

There are several steps to conducting a study using multidimensional scaling:

  1. The wording of the problem. What variables do you want to compare? How many variables do you want to compare? What purpose will the study be used for?
  2. Receiving input data. Respondents are asked a series of questions. For each pair of products, they are asked to rate the similarity (usually on a 7-point Likert scale from very similar to very heterogeneous). The first question may be, for example, for Coca-Cola / Pepsi, the next for beer, the next for Dr. Pepper, etc. The number of questions depends on the number of brands.
Distance scaling

Alternative approaches

There are two other approaches. There is a technique called “Perceptual Data: A Derived Approach,” in which products are decomposed into attributes and evaluated on a semantic differential scale. Another method is the “approach to preference data,” in which respondents are asked about preferences rather than similarities.

It consists of the following steps:

  1. Run the MDS statistical program. The software for performing the procedure is available in many statistical software packages. Often there is a choice between metric MDS (which deals with interval or relationship level data) and non-metric MDS (which deals with ordinal data).
  2. Determination of the number of measurements. The researcher must determine the number of measurements that he wants to create on the computer. The larger the measurements, the better the statistical fit, but the more difficult it is to interpret the results.
  3. Display of results and determination of measurements - the statistical program (or related module) displays the results. Each product will be displayed on the map (usually in two-dimensional space). The proximity of the products to each other indicates either their similarity or preference, depending on which approach was used. However, how measurements actually correspond to measurements of system behavior is not always obvious. Here a subjective judgment of conformity can be made.
  4. Check the results for reliability and reliability - calculate the R-square to determine the proportion of the variance of the scaled data that can be taken into account by the MDS procedure. A square of R 0.6 is considered the minimum acceptable level. A square of R 0.8 is considered good for metric scaling, and 0.9 is considered good for non-metric scaling.
Multidimensional Scaling Results

Various tests

Other possible tests are stress tests such as Kruskal, tests for shared data, tests for data stability and the reliability of retesting. Write in detail about the results in the test. Along with the mapping, at least a measure of distance (e.g. Sorenson index, Jacquard index) and reliability (e.g. voltage value) should be indicated.

It is also highly advisable to give an algorithm (for example, Kruskal, Mather), which is often determined by the program used (sometimes replacing the algorithm report) if you gave the starting configuration or had a random choice, the number of dimension runs, the results of the Monte Carlo method, the number of iterations, stability assessment and proportional dispersion of each axis (r-square).

Visual information and data analysis using multidimensional scaling

Information visualization is the study of interactive (visual) representations of abstract data to enhance human cognition. Abstract data includes both numerical and non-numerical data, such as textual and geographic information. However, informational visualization differs from scientific visualization: “it is informational (informational visualization) when a spatial representation is selected, and scivis (scientific visualization) when a spatial representation is given.”

The field of information visualization appeared as a result of research in the field of human-computer interaction, applied use of computer science, graphics, visual design, psychology and business methods. It is increasingly being used as a critical component in research, digital libraries, data mining, financial data, market research, product control, and so on.

Methods and Principles

Information visualization assumes that the methods of visual presentation and interaction take advantage of the wide possibilities of human perception, allowing users to simultaneously see, explore and understand large volumes of information. The visualization of information is aimed at creating approaches for the transfer of abstract data, information in an intuitive way.

Color Multidimensional Scaling

Data analysis is an integral part of all applied research and problem solving in industry. The most fundamental approaches to data analysis are visualization (histograms, scatter plots, surface plots, tree maps, parallel coordinate diagrams, etc.), statistics (hypothesis testing, regression, PCA, etc.), data analysis (comparison and etc.) and machine learning methods (clustering, classification, decision trees, etc.).

Among these approaches, information visualization or visual data analysis most depends on the cognitive skills of the analytical staff and allows you to detect unstructured and effective ideas that are limited only by the human imagination and creativity. An analyst does not need to learn any complex methods in order to be able to interpret data visualizations. Information visualization is also a hypothesis generation scheme that can be accompanied and usually accompanied by more analytical or formal analysis, such as statistical testing of hypotheses.

The study

The modern study of visualization began with computer graphics, which “was used from the very beginning to study scientific problems. However, in the early years, the lack of graphic power often limited its usefulness. The priority for visualization began to develop in 1987, with the release of special software for computer graphics and visualization in scientific computing. Since then, several conferences and seminars have been held jointly organized by the IEEE Computer Society and ACM SIGGRAPH. "

They focused on the general topics of data visualization, information visualization, and scientific visualization, as well as more specific areas such as volume visualization.

Multidimensional brand scaling

Generalization

Generalized multidimensional scaling (OMDS, GMDS) is an extension of metric multidimensional scaling in which the target space is non-Euclidean. When the differences are distances on the surface and the target space is a different surface, GMDS allows you to find an attachment with minimal distortion of one surface to another.

GMDS is a new line of research. Currently, the main applications are recognition of deformable objects (for example, for three-dimensional face recognition) and texture mapping.

The goal of multidimensional scaling is to represent multidimensional data. Multidimensional data, that is, data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lies on an embedded nonlinear manifold in multidimensional space. If the collector has a sufficiently low dimension, the data can be visualized in a low dimensional space.

Many of the nonlinear dimensional reduction methods are associated with linear methods. Non-linear methods can be generally classified into two groups: those that provide a mapping (either from a multidimensional space to a low-dimensional embedding, or vice versa), and those that simply provide visualization. In the context of machine learning, display methods can be considered as a preliminary stage of feature extraction, after which pattern recognition algorithms are applied. Usually those that simply provide visualization are based on proximity data - that is, distance measurements. Multidimensional scaling in psychology and other humanities is also very common.

Diagonal multidimensional scaling

If the number of attributes is large, then the space of unique possible strings is also exponentially large. Thus, the larger the dimension, the more difficult it becomes to depict space. This causes a lot of problems. Algorithms that work with multidimensional data tend to have very high temporal complexity. Reducing data to fewer measurements often makes analysis algorithms more efficient and can help machine learning algorithms make more accurate predictions. Because multidimensional data scaling is so popular.

Source: https://habr.com/ru/post/F22165/


All Articles