In short: I want to make a presentation and speak at online shopping conferences a couple of times. Wrote the text of the presentation, we need feedback. I hope for your help in drawing up a competent and accessible text.
The current text of the presentation under the cut
Full-Text Search Overview
Full-text search is essentially Google and Yandex technology: the user enters a search query - gets an answer in the form of a specific set of pages. On the website of the online store, this is likely to be a list of products that most closely match the search query. There is such a concept - relevance, and so, it is important. Also important: synonyms, typos, morphology and the usual logic of the ordinary ordinary buyer.
')
In the online store, for example, the words “buy”, “best”, “quality” can simply be ignored, since they mean nothing. And according to the words “fashionable”, “expensive”, “Apple iPhone”, “bosh battery for Kia Sid”, you can uniquely determine what a person needs.
Here is a brief about why you need a full-text search.
Why do you need a search on the site: advantages and disadvantages
The main reason and goal is the conversion of the visitor into the buyer, that is, the percentage of purchases relative to the visitors on the site. The easier it is for a person to find a product, the more likely he is to become a buyer. Every day, customers are becoming more lazy and getting used to the convenient search technologies of other stores, so the absence of such a search on your website can have a very negative impact on sales.
In addition, there is a specific nomenclature that cannot be found by a simple passage through the catalog and it is the full-text search that is needed: this is when there are more than 1000 items, or it is difficult to choose the exact product you need. In this case, the computer simplifies the task of choice and helps a person to make a purchase sighted, and not by the method of "scientific poking".
Disadvantages: $ 100- $ 200 for the development of a search module for the site.
Sphinx search implementations (SphinxSearch): OS, installation options, programming languages
Sphinx is an open source product. This is actually a search engine, developed from scratch, for high-load projects, with custom relevancy (meaning search quality), as well as ease of integration into any project. It is written in C ++ and runs on Linux (RedHat, Ubuntu, etc.), Windows, MacOS, Solaris, FreeBSD, and some other less popular systems.
Sphinx allows you to store prepared text data indexes, search the SQL database, NoSQL repository, or just the files on the server quickly and easily. It can index data on the fly, add new index data to existing ones, working online without overloading the server.
Various text processing capabilities give the programmer fine-tuning of Sphinx to the requirements of your application, as well as a number of functions to ensure that you can customize the search quality exactly as you need it. There are two connection options:
• Search by SphinxAPI - the usual API
• SphinxQL - analogue of standard SQL
The Sphinx Cluster can scale up to billions of documents and tens of millions of search queries per day. This kind of load is maintained on sites like Craigslist, DailyMotion, Netlog, etc.
A real example of search queries in the online auto parts store
Our company has been dealing with auto parts, or rather selling online stores for parts, for a long time, fruitfully and quite successfully. But for a number of reasons, the relevant search was needed just now.
The main reason, most likely, is that full-text search is not suitable for most parts. But there are 5-10% of the goods for which he is catastrophically needed without him. And our standard search with inherently direct cross-links and indicating a clear model and brand of car from the parts catalog for this product group does not work. An example of such "wrong" products: oils, tires, batteries, car lamps, wipers and other similar products that are often sold.
The average price for spare parts of a small ordinary company is 2-10 million positions, respectively, 10% of this base will occupy the data we need. Therefore, we decided to implement the
sphinxsearch.com engine in our product. More information about this implementation can be read on Habré:
habrahabr.ru/blogs/sphinx/132118In-depth study of the search query language
The main search functionality is custom weights for the fields, as well as search methods. In order for the morphology to work and in the query “oil Castrol 5W40” there were documents with the text “Oil” and “15W40” - you need to simultaneously use the "*" symbol and search for the word "oil", and for this you need a query builder that works it is in the “SPH_MATCH_EXTENDED2” mode.
It is possible to sort by default both by price, and relevance, and by a set of such parameters. For each group of products, you can set your own method of relevance and the procedure for issuing results, this is especially true when there are more than a hundred documents.
Incremental indices allow you to add new products and documents on the fly without stopping your online store.
And most importantly: synonyms. In our example, you can use the query "Castrol oil 5W40" and it will find the same as Castrol 5W40 oil. In the example with “C #” you need to include such non-standard word forms so that they are not processed according to the standard index scheme and work manually exactly as you configure them. Only you know the exact meaning of the phrase in your project, for example, “C #” = “TO DIEZ for musicians”.
Who can do such a sphinx-based search?
Any junior programmer who is familiar with English and who can read the documentation on
sphinxsearch.com/docsThere are also some Russian documentation and a couple of articles in Russian.
There are virtually no restrictions, except that the fear of a new approach to search. It seems that this is all difficult and very expensive. In fact, it's all simple, fast and cheap. On Habré there is a special blog in which responsive programmers, including the Russian-language developer of the engine, answer all the questions that arise.
Thank you, I will be glad to answer your questions.
* Sponsor's sponsored link:
tecdoc + sphinxsearch online store development* Link to the finished presentation for the conference:
http://www.mstarproject.com/temp/4/presentation_sphinx_revision3.ppt