Since Habr I love and respect, I decided to announce our new niche search engine here, and at the same time ask the community for help.
So,
TechObzor is a new search engine that helps you easily find tests, reviews and reviews of users of modern household and consumer equipment.
A few words about it, and a detailed recipe for creating your own search engine:
')
The fact that it is easier to read the tests and responses of real users of the piece of iron before buying, than to bite your elbows because of a rash decision, no one doubts. The problem is that numerous respectable and not very sellers of this very technology also know about this our custom. Therefore, the further, the more speculate on requests that begin with the words "review ...", "test ...", "reviews ...".
The goal of TechObzor is to give those who seek “pure knowledge” access in the form of search results that actually contain references to tests, reviews and user reviews, rather than “super-beneficial unique” offers that are disguised as this kind.
It does not make sense to talk about our search engine for a long time and to blow an elephant out of a fly - to get an idea of ​​what TechDesk can and can’t do, you can simply ask a
couple of requests - for example, about the technology that is on your desk, or which you are going to purchase. I am sure you will learn many new things :)
Now turn to the promised recipe for creating your own search engine.
In fact, to make a niche search engine in our time is quite simple - the benefit of
Google CSE gives you quite ample opportunities even with a minimum of development tools. And if you add a little programming skills and a bit of design to this, then you can get quite an attractive little thing.
But in order to get your audience, the search engine is not enough just to exist. He must
significantly surpass his elder brothers like Yandex and Google in the issue of purity. But to achieve this just quite difficult. And success here lies only in one direction - a very high-quality selection of resources, which the search index is limited to.
So, if you are going to make your niche search engine, you need to go around the Internet and carefully pick up two bags of URL:
Bag number 1: List of sites that publish high-quality information on the topic of search. In the case of TechObzor it is:
- online and offline media sites publishing tests and reviews;
- Thematic forums where there is a discussion of new technology, as well as
technical issues;
- sections of commercial sites (first of all, online stores), which
publish good tests and reviews of equipment, written by independent journalists.
At the same time, it is important that the selected sites publish unique reviews, and not just jerk off without special analysis of the article from other sources. This is checked either by links after the article (after all, most already indicate the source from which the article is taken), or with the help of special checking systems like
Copyscape .
It is also important to try, whenever possible, to single out on the source site exactly the section in which the necessary information is published. In our case, it was relatively easy - on most sites, articles are located in the url of type domain.ru/article/stat1.html. Then the duplicate part of the URL - domain.ru/article/ - is entered in the list, and the option “Include all pages has been chosen to contain this URL” is selected in Google CSE (by the way, it is selected by Google by default). Now the search will be performed only in this section, and the rest of the site will be ignored.
Why is this useful? Because:
- even on fairly serious sites of the RuNet, there is often a link-washing page containing a ton of links to all glorious resources with accompanying texts that will not decorate the search results of your future search engine at all;
- on the sites there are often abandoned guest books or poorly moderated forums with the following sweets like “all prostitutes of Moscow”, which also unexpectedly will delight visitors of your search engine;
- the same site may contain sections from different areas, including those that do not correspond to your tasks, and they will also litter the issue. In our case, there were frequent sections “Games”, “Internet News”, etc., which have nothing to do with equipment reviews.
Bag number 2: Stop list. It is very important to make and maintain a list of exclusion pages. The reasons here are the same as I described above. Simply, it is not always possible to clearly highlight the necessary information section, and then you have to go from the opposite - indicate the site entirely and then add to the stop list those pages and sections that should not fall into the search results. Google provides a small, but fairly functional toolkit for specifying page patterns that you would like to exclude from your page. It is quite well described on the Google CSE site.
After you have worked on these two lists (we have a week of dense work of one editor), you can score them on the Google CSE interface (really, it’s all well thought out in this regard) and start testing the search engine. You need to test for a long time and thoroughly, using not only the correct and beautiful requests, but also those that provoke the manifestation of garbage pages. All identified pages and sections are included in the stop list. And so on to infinity - because there is no limit to perfection :)
According to this algorithm, we have done quite a lot of work on a set of sites for which we are looking for TechObzor tests and reviews. But during this work, the eyes became blurred, the hands were jammed, and the brain ran in a circle. Therefore, I want to ask you for help in completing it.
Help, in principle, is simple, and it consists in replenishing Two Technical Review Bags with links that are in your bookmarks or which you can find. I understand that most of the people in Khabrovsk are busy, so for my part I promise that your efforts will not remain invaluable:
- for each link to the site that publishes tests and reviews sent to me at habraposhta or specified in the comments, I guarantee a plus to the karma sent by;
- for each found trash page or section in the issuance of TechObzor - a plus sign for a comment.
Links to sites with reviews should be absent in the Tech Review and follow the simple rules described
on the site (click on the link "For those who want to find"). If different people publish the same link, then the principle of primacy works.
I understand that it’s not particularly cold and hot from these pluses, but for my part, this is little that I can express my gratitude for helping us in our work.
Thanks to everyone who responds!