🦍 👶🏿 🤱🏾 Sphinx Search Engine: Key Features, Applications 🕧 🧓 🔂

Nowadays, hardly anyone wants to choose products in the online store, where you need to sort out the categories or scroll through long lists of products.

There are many tools available that can make internal site searches fast, intuitive, and tailored to any customer needs.

The Sphinx search engine seems to promise exactly that. The full-text search engine is both flexible and fast.

Sphinx works as a standalone server and does not store text for itself. It creates an index based on an SQL query that retrieves documents from the database, saves indexes, and at a later stage returns rows matching the query.

What is Sphinx

Sphinx Search Engine is a full-text search engine that is free, fast and scalable. It is designed for performance and relevance. It has no analogues in any traditional database.

A large number of well-known sites with high traffic rely on it for an advanced level of search and scalability.

Key features of Sphinx

Sphinx helps enable and add value to search and scalability with the following features that make it popular with thousands of e-commerce developers and sellers.

High speed of search (up to 150-250 Mb / s on the core with 1,000,000 documents).
Support for distributed real-time search.
High indexing speed (up to 10-15 Mbit / s per core).
High scalability (the largest of the known clusters is able to index up to 3,000,000,000 documents and can process more than 50 million requests per day).
Simultaneous support of several fields (up to 32 by default) for full-text document search.
The ability to maintain a number of additional attributes for each document (for example, groups, timestamps, etc.).
Using stop words.
Support for various APIs of programming languages (for example, for PHP, Python, Java, Perl, Ruby, .NET and C ++, etc.).
Ability to handle both single-byte encodings and UTF-8.
Morphological search.
Integration with the most popular database management systems (e.g. MySQL, PostgreSQL)

In general, the Sphinx search engine has more than 50 different functions (and this number is constantly growing).

How Sphinx Works

The entire complexity of the search engine operation scheme is summarized in 2 key points:

Using the source table, Sphinx creates its own index database;
then, when the user submits an API request, Sphinx returns an array of identifiers that match the identifiers in the source table.

Why use Sphinx

The main reason you should use it is the speed of your search. Regular user searches in MySQL take significantly longer than searches in Sphinx. The user begins to notice the difference as soon as his database has millions of records. If the database is small (for example, a forum of 100 users), this is not quite what you need. Although you can try. Plus there are interesting functions, such as the morphology of the word (if the user is looking for cats, this will correspond to the cat, if he is looking for running, this will correspond to running, running, etc.).

Another reason is full-text search. Has anyone ever thought that while searching for two words on Google, he would look for them in the same paragraph or in two paragraphs (or in a sentence), but not throughout the page? Sphinx allows you to do similar things.

Scalability is as follows. If the user has large databases on many servers, Sphinx will take care of this. And the application will consider that it works on one server. Sphinx can remove most of the load from PHP servers in terms of processing and retrieving information.

Sphinx is a little different from what the user is used to with MySQL queries. So you should not expect to get everything instantly.

What is indexing?

Sphinx retrieves data from a table in a MySQL database and performs a process for them called indexing. Indexing creates a file that can be easily found using Sphinx. For example, if a user tries to find a document in Microsoft Word, he will search for words one by one in the text of the document. In very large documents, the search can be very slow. Sphinx, on the other hand, performs indexing before performing any searches. This creates an index that can be efficiently searched, rather than searched word for word throughout the document. A good example is the encyclopedia index. If the user wants to find information about cats, he could do the same as Microsoft Word, and read each page of the encyclopedia in search of the appearance of the word “cat”. Or he can look at the alphabetical index at the end of the book, which says that information about cats is available on pages 104, 195 and 653. This is much easier.

You can only search for what is indexed.

What you need to remember is that Sphinx can only search in the index. This means that every time the user wants to find the latest results, he must update the index.

Data access

If the user has already worked with PHP with MySQL, it will be much easier for him. Otherwise, he should probably learn PHP and MySQL.

The Sphinx search engine usually returns MySQL identifiers, not data.

The main thing to remember about Sphinx is that it does not retrieve data. First of all, he receives the identifiers of documents. Sphinx does the intense part that searches for specific entries. Then the user can execute the simple part through MySQL, which receives this document. So, for example, if Sphinx retrieves document identifiers 1, 5 and 7 from the index, then you will need to execute a query in MySQL to get records (probably with identifiers 1,5 and 7). You might think that this is primitive, but MySQL requires very few resources to search for a document identifier compared to a word search.

Example. Say Sphinx pulls out documents with identifiers 1, 5, 7 (SELECT * FROM documents WHERE id IN (1,5,7)).

The user tells MySQL to select all the columns from the document table (or any other in which the result was), where the identifier (or what was called its field) is 1, 5 or 7. And then you can use mysql_fetch_array in PHP, in order to look at the data and do anything with it.

After mastering the work in Sphinx to organize the results, you can save the order as follows:

Saving the order of the results in an array (just save the id property for matches).
Doing an IMPLODE for an array using $ result = implode (",", $ array), where $ array is the array of user results. Result will store a string of result identifiers, separated by a comma. SELECT * FROM documents WHERE id IN ($ result) ORDER BY FIELD (id, $ result).

Here the user tells MySQL that it is necessary to order the results by the id field in the order specified in $ result.

It may seem complicated, but you quickly get used to it, and soon the user will write functions that will cope with all this for him.

Final conclusions

Using Sphinx instead of MySQL can provide significant speed benefits. Sphinx is ideal for finding static tables. But at the same time, the ability to use simple index files is not available for frequently updated rows. Instead, you must either embed the delta files, or switch to real-time indexing. Both that and other decision carries additional expenses of productivity. In conclusion: to work more efficiently in Sphinx, planning is necessary, because the user must install all the necessary sources and index files in advance.

Replacing Sphinx for MySQL is not trivial, but also not so difficult to abandon this feature. If you need a high search speed, you should consider moving from MySQL to Sphinx, even when the user does not need full-text search.

Sphinx Search Engine: Key Features, Applications