It is difficult for a modern person to imagine life without the Internet and almost instant access to information sources. The user rarely thinks about how to search for the desired content on the network. But this is very interesting.
Information retrieval system (IPS) is a complex hardware-software complex that selects information at the request of the user. Information is stored on servers in digital form, as previously books were on the shelves of libraries. A system consists of many subsystems. Each performs its task in the process of processing the user's request and providing him with information in text or sound form. The multiplicity of tasks to be solved is due to the complexity of the architecture of modern IPS (the abbreviation for information retrieval system). A kind of "black box": at the input - the text of the request, what is inside is unknown, at the output - exhaustive information.
Input streams
Requests for information that a person forms in text form on the screen of his gadget make up a small part of the requests processed by the search system. The main arrays of search queries are formed by robots that accept human requests and carry out multi-step search and user feedback. Information retrieval systems include the well-known Google, Yandex and others, which process millions of queries daily.
Source Search Objects
The set of source objects of interest for the search are documents, records, videos, images, and more. They are created outside the IPS. The general information storage and retrieval system should have a built-in bibliographic system - a kind of catalog that allows you to search for any kind of objects.
Objects or their digital transformations become an “input resource” in the IPS. It is among them that the information necessary for the user is selected.
External sources
In the presentation of the choice of information, external sources of knowledge are used. This is the information that the user is looking for. The name of the film, a quote from the book and more. For a computer search, this information must be translated into a query in an algorithmic language. In IPS, this is done using the block for creating, indexing, and developing queries.
Ideally, these three processes — representation, indexing, and query development — should rely on identical sources of knowledge, but in practice this is unattainable.
It is necessary to constantly review and update the sources of knowledge, and the update should be identical and synchronized. And an external source of knowledge always chronologically precedes its use in search engines for a query, sometimes for several years.
Representation
The representations of the source objects are compiled from the input data into some combination or converted in accordance with the rules and algorithms of a particular information retrieval system.
Views are more or less transformed copies of the original search object. In a collection of unedited full texts, each text is its own representation. In the collection of objects of museum exhibits and artifacts, a representation may be a transformed description of an object with its image. In some cases, a representation can be partially obtained from the original object and partially from the description: in bibliographic search systems, representations are derivatives of the object — for example, the name, author’s name will be combined with the annotation of the work.
Searchable Index
Since information in information retrieval systems is stored in the form of a presentation, it is logical to assume that the search is carried out according to the presentation and after selection is issued to the user. In practice, this is not so. For example, current network library directories typically restrict searches to several fields: author, title, and subheadings within a view containing other fields that are not searched. This is sufficient reason why it is necessary to distinguish between a view and a searchable index, which is the search part of a view. It defines everything that should be searchable. A searchable index, both a view and a source object, can be divided into separate subindexes to provide a more accurate, targeted search.
Search engines usually have a synthetic structure inside to match valid search results. This structure is the second searchable index component.
Procedurally, the indexing process can be implemented in different ways: an index available for search can be obtained by:
- Literally copy a searchable search
- by copying the details of the view. This can be part or all of the representations that exist physically only in the form of fragments, distributed according to the rules for creating an index for the search, which will be assembled if necessary.
Request Development Rules and Formal Requests
Query development is a function that mediates between a user’s request and a formal request. It converts the user's query, matching it with the dictionaries of the extraction commands, index specification, and index before extraction. At the dawn of the development of IPS, this role has traditionally been assigned to qualified IT specialists.
The development of computer queries that can match queries with a dictionary into a searchable system index is usually referred to as a dictionary input module. Automation of this function is promising and offers opportunities for expert and probabilistic search methods.
A formal request becomes one after converting a user request. Examples of such formal transformations include truncation, substitution, normalization, vectorization, and other transformations of the "external" representation into the "internal" representations of computer IPS (decoding - information retrieval system).
Extracted Document Link Sets
The resulting set of information sources is logically a subset of representations created according to the matching rules applied to a formal query with a searchable index.
Usually, but not necessarily, there is a separate process for sorting the recovered information set. Online library catalogs usually reorder the resulting sets in alphabetical order by author before displaying. In information retrieval systems that perform strict ranking, rank order precedes any reordering.
Output streams
Display of search results is traditionally carried out on the display, often in the form of a stream of objects that will be used elsewhere or for some other purpose, completes the main search cycle. Such streams can be directed to visualization devices, storage for subsequent processing, or used as input streams to other selection services.
Information retrieval systems allow feedback from the output of any selection process. The output from any process can be feedback from other processes. Feedback can provide the basis for expert judgment at any stage.