This is a reprint of an article by Ivan Nikitin, which was published on our website Nomagic.ru in September. This article contains only a statement of the problem and a discussion of possible solutions. Links to articles describing the solution of the problem with the help of the LiveSearch API for ASP and PHP can be found at the end of the article.
Any modern site with more than 5 to 10 pages with content should have a search engine. No matter how well we plan the navigation pane, or the catalog of products / sections of the site, anyway, any of our attempts to
intuitively systematize will ultimately be
incomprehensible to the 101st site user.
Want to make sure that it is? Here are a few simple tasks, try to spend a few minutes on solving them (all the examples were taken absolutely randomly from a list of personally familiar to me and websites I visited. These examples are in no way meant to diminish the quality of these resources)
- Find on the site http://www.specialist.ru/ (without using the search!) 2 (two) courses on Microsoft SharePoint 2007. Write down how much time you spent on it.
- Find on the site http://www.sipnet.ru confirmation that the VoIP D-Link DVG-2001S gateway works with the Sipnet service, as well as its brief description. Write down how much time you spent on it.
- On the website www.megafon.ru, find the annual report on the results of work for 2006 in the Microsoft Word format (without using a search). Did you succeed?
Should continue? I think you already agree with me. And just as the site developers argue, when they face the problem of creating a search engine. Unfortunately, most developers underestimate the complexity of this solution and believe that the search can be reduced (simplified) to a SQL query:
SELECT * FROM products WHERE
title LIKE '%-%'
OR description LIKE '%-%'
So it may be so, except that the value of such a search will be zero. You can, of course, make it more difficult to add a search by words and their combinations (I am so moved by the phrase that you sometimes find on the websites: “
You can use AND, OR, NOT ”. Aha! You can explain the boolean algebra to the user). But the problem with such a search is that the developer
believes that the user
will enter the names of products or news headlines in the same way as they are indicated on the site, and the user
enters just what he needs now, in a completely arbitrary form, and he, the user, as a rule, enters
short queries consisting of one or two words. That is, the user is looking for courses on SharePoint 2007, he will write “SharePoint 2007”, and not “Windows SharePoint Services v3”. As a result, we get a completely non-working search engine, because such a search or throw out hundreds of links as a result, and finding something as a result of the search will be impossible, or will not give anything. Want to make sure? Take two powerful resources with large development budgets and try to test the search for them:
- On the website www.mts.ru, find a credit form of payment for calls using the search, that is, how to arrange it and how to pay for the calls ... What request will you enter? " Credit form of payment ". The result will be something like this:
- On the website www.alfabank.ru find information about mortgage lending. What query will you enter? " Mortgage ". Here is the result:
It is easy to replace that both times you got a negative result. In the first case, you did not receive anything, in the second - completely unnecessary information (how did you like the link to the banner on the mortgage?). Notice, both times, an
unsuccessful search can make a customer leave forever : I will not switch to MCT, since there is no credit form of payment for calls (in fact, there is!), And I will not turn to Alfa Bank, because I could not find mortgage terms (once again - these are just examples! Nothing personal!).
How to solve this problem?
Implementing an effective search
First, it is necessary to realize that a good search is a far from trivial task. Moreover, one can say the following: the complexity of implementing a good search far exceeds the complexity of implementing the functionality of the entire site. Therefore, you need to think a hundred times before setting yourself such a task. Are you ready to write a system of morphological analysis, assessing the relevance of documents, an algorithm for ranking the results? And most importantly, how many man hours and thousands of lines of code are you ready to put on it?
But we, nevertheless, can solve this problem! We have at least three ways to solve it:
- Using Search Engine Forms
- Using Available Web Services
- Using third-party solutions
All these methods differ from themselves in labor costs, cost, and the result obtained, but all three methods give an order of magnitude better result than the above examples.
')
Using Search Engine Forms
This is the cheapest and most easily implemented method. Instead of writing your raw and poor-quality search code, you simply embed a form on the pages of your site that passes the query to the search engine. We will show the use of Google as such a system, although you can use any other, for example, the following Yandex forms:
http://company.yandex.ru/forms/ . But I like Google more, because, in my opinion, the quality of the search is much higher than that of other search engines.
So, we draw something like this:
<form method="get" action="http://www.google.com/search">
<input type="hidden" name="ie"
value="windows-1251" />
<input type="hidden" name="domains"
value=" www.specialist.ru " />
<input type="hidden" name="sitesearch"
value=" www.specialist.ru " />
<input id="searchBarInput" type="text" name="q"
value="" />
<input id="searchBarSubmit" type="submit"
value="!" />
<div>
<a id="extendedSearchLink" href="http://google.com/">
<span>Powered by <span style="color:blue">G</span>
<span style="color:red">o</span>
<span style="color:olive">o</span>
<span style="color:blue">g</span>
<span style="color:green">l</span>
<span style="color:red">e</span></span></a>
</div>
</form>
Please note, an indication that the search is provided by the search engine Google necessarily! That's all! Due to the hidden fields, we ask Google to search only on the specified site. And the quality of the search will be obviously higher than in the examples given. Let's make sure:
The first link indicates the registration of the credit form of payment on the MTS website.
Example with Alfa Bank:
The first result - all the information about the mortgage Alfa Bank!
Of course, with all the simplicity of this method, its disadvantage immediately catches the eye: the user switches from your website to the search engine. In fact, in itself it is not so bad, because all the links from the search engine back lead to you and only to you, but here is contextual advertising. I do not think that Alfa-Bank will agree with a similar proposal to use a similar scheme. :-)
Nevertheless, this method can be strongly recommended to low-budget or non-commercial sites, since the quality of the search far outweighs the negative aspects in the form of contextual advertising.
Using Available Web Services
In this way we will try to refuse to display other people's ads in the search results. Many search engines provide services for automatic search. This and Yandex.XML (
http://xml.yandex.ru/ ), and Google services and others. The general meaning is that we provide our own search form, which sends the user's request to our server, which in turn passes it to the search engine. After receiving the results, our server displays them in any design, in any form on our website. The user does not even realize that the search was carried out by some external system, since he sees the results in the design of our site. True, Yandex.XML has some completely incomprehensible licensing system (the requirement to display Yandex.Direct ads in parallel), and Google sly covered a similar service about a year ago, and now provides such a search only in conjunction with AdSense ads, again with contextual advertising .
But here you can find a way out. Microsoft has an API for working with Live.com search (
http://dev.live.com/livesearch/ ), which (the API is “it”) allows you to implement such a system. True, this API limits the number of requests per day to about 1000 - 3000 requests, but for average sites this is enough.
Implementing such a search is completely easy, especially as the Live Search API provides for SOAP calls to an XML Web service, which means that these calls can be made from any platform and from any website development tool: PHP, ASP.Net, etc.
Some time ago we made the implementation of such a search, when it became necessary to create a search for the site Specialit.ru. You can see it in action at:
http://search.specialist.ru
If this topic seems interesting to you, please leave your feedback and suggestions in the comments to the publication, and in my next article I will give a detailed example of the code for the implementation of a search engine based on the Live Search API. Believe me, everything is much easier there than it seems at first glance. :-)
Using third-party solutions
However, the method that uses available Web services, such as the Live Search API, has two noticeable flaws:
- The inability to quickly manage the reindex resource
- The impossibility of indexing (and as a result of the search) in the closed sections of the site
The first drawback is due to the fact that search engine robots themselves set the schedule for updating your site in the index, and if, for example, your site does not give the correct HTTP Last-Modified response header (which is a disease 90% of sites on the Internet!), Then this time be significant. That is, after the appearance of new materials on your site, days or even weeks may pass before they appear in the search results.
The second drawback is generally fatal. The search engine robot will not be able to gain access to the closed sections of your site (for example, to a closed forum where authorization is required), and therefore information from closed sections will never appear in the search results. It is possible, of course, to dodge and make an impersonal publication of information from closed sections (for example, to display closed form messages without information about users), but this will not always happen. For example, how to deal with a search in your corporate email?
Here third-party search engines can help us, for example, Yandex.Server (
http://company.yandex.ru/technology/products/yandex-server.xml ) or corporate Microsoft Office SharePoint Server (
http://office.microsoft.com /ru-ru/sharepointserver/FX100492001049.aspx ). The second one I know is much better than the server from Yandex, and it has a fairly powerful search engine that can be used, including, to search your site.
Perhaps in one of the following articles, we will also look at integrating Microsoft Office SharePoint Server 2007 with your site to build an effective search engine.
Related Links
-
Article about the implementation of site search using the LiveSearch API on ASP.NET
-
Article about the implementation of site search using the LiveSearch API for PHP5