📜 ⬆️ ⬇️

metabus - a platform for building thematic search engines

It all started with the fact that at some point I understood: it is quite difficult to search for goods, services and “real” life sites on the Internet. Yes, almost everything can be found through popular Internet search engines, but when you start looking for something from real life tied to a physical address, it becomes more difficult. And when you still need to clarify the request, set a number of characteristics, time, location or price of the goods, it becomes even more difficult. As a result, the search boils down to a manual search of multiple pages and the mass of time spent. Each topic has its own resources. We are looking for a film show - go to the Poster, electronics - choose on the Market. Then someone orders via the Internet, and someone goes to the nearest store and buys the selected one. We are looking for an ATM - often this application on your phone. The result is a multitude of services, the essence of which boils down to one thing - the search for goods, services and places in real life. At that moment I wanted a single, convenient service that would do all this.

As a result, after almost two years I finally finished developing the project. As planned, it turned out a platform that allows you to search for any goods, services and places. All data has its own structure, which allows you to make complex queries on a variety of characteristics. Also, all data has a geolocation, which allows you to make geo-oriented queries. Requests can be made as one phrase, for example, “to find a hotel with a swimming pool near Kievskaya”, or with the help of special filters.

metafind - an example of such a search


Search begins with a single search string. In the request, you can specify the address in free form. If the address is not specified, then geolocation will be attempted. The search results display the products grouped at the place of sale, their prices, phone numbers, addresses, opening hours and characteristics specific to the subject you are looking for. The results also contain a map and a filter to refine the query. For example, here’s what the drug search looks like:
')


When searching for food or other topics - automatically see other filters and additional features specific to the desired subject. A mobile version is also available:



Examples


At the moment, only Moscow is included in the coverage area, according to the following topics: drugs and pharmacies, gas stations with gasoline prices, car washes and their services, ATMs, payment terminals, hotels, cafes, restaurants, food (from the menu), pools, ticket offices and tickets.

As I said, you can make complex queries with a variety of characteristics: search by price range, by work time, set the search radius, set characteristics specific to a particular subject. Here are some examples:
  1. black caviar in a restaurant with karaoke near the Kremlin
  2. hotel with a swimming pool near Kiev
  3. complex washing in the exhaust
  4. Transfiguration Square pool 50 meters
  5. ATM with dollars near Pushkin
  6. sleeping beauty at kursk station

Data coverage is not complete yet. Those interested can leave a request for adding data.

Why platform?


Because now it is a whole complex of sites and services.

The data is indexed from the sites I need (I do not have the task to index the entire Internet, only sites on the given topics). These are either primary sources (for example, official sites of banks or cafes), or aggregator sites (for example, sites with a list of pools or car washes). But in any case, I always display a link to the source of information and do not claim authorship of the data. For most sources, data is updated once a day. In the perspective of several years, I plan that interested organizations themselves will begin to upload their data using an API (or price lists), which will be added to the search index in real time. This function has already been implemented, but is still closed for public use due to lack of demand.

Already, using the API, you can do various search services. You can embed a search into existing sites, make themed mobile apps or social networking apps. For example, to test the search for drugs, I made a form looking only for drugs and pharmacies:



Development


The project has been under development for almost two years. No team, I work alone. Originally used .NET and MSSQL Server. After a month and a half, I realized that these technologies are not quite suitable for solving my problem, and switched to Java. By mistake, I got into technology: the first version was made using Hadoop, HBase and Lucene. Thrift was used as a protocol for exchanging data between modules. Now the technological basis is a bunch of MongoDB and Apache Solr. All sites operate in the Google App Engine. The platform itself works on one dedicated server (EQ4 from Hetzner), the spider on another (also EQ4). Since I am one, I decided to automate everything to the maximum: there is a separate CI server (also for Hetzner), a lot of units, functional and integration tests. Publication on the development and combat environment is fully automatic. Also for internal purposes is actively used NodeJS. Servers run under Ubuntu Server 10.10.

Performance


I am almost sure that the server will fall from the habra effect. I tried to test: all sites are hosted on Google App Engine, and there is no point in testing it. It remains to test the load API (in JavaScript), and this is more difficult (you need to send a lot of unique, but real requests). In theory, there are no performance bottlenecks in architecture; all nodes can be scaled horizontally. But now everything is hosted on one server: the base, and search indexes, and the web server of the application. In an amicable way, everything needs to be spread across several servers, which is unreasonable to do only for the time of habraeffect.

Harsh reality


In reality, many users do not even know about the existence of the address bar in the browser. There is only Yandex / Google. The Internet begins with them. Even if it is much more convenient to search for goods and services through the metabus, it is extremely difficult to accustom users to do it anyway. There is a habit of looking for everything through Yandex / Google - and nothing can be done about it. Or spend millions on promotion. Or look for other methods.

As I have already said, it is very difficult to promote the project as a “single search for any goods and services”. I plan to move forward by creating thematic search engines, themed mobile applications and social networking applications. I also plan to develop an affiliate network. In this case, a single search engine (now this metafind) will continue to work, gradually accumulating more and more data.

My goal is to make a convenient, unified, geo-oriented search engine for goods, services and places where they can be purchased. Sounds like an absurdity, but there should be absolutely all the data in the database: starting with the names from the construction market in the Vladimir region and ending with the menu in a Moscow restaurant. And so it will be. Another thing is that it will not be soon, and most likely - alas, no matter how sad it may be for me - it will be done by search giants. You will not need to call in the future, look for it is not clear where on the Internet and even more so visit real-life locations in order to find the desired product or service. There will be some kind of a single data bus, where organizations will upload data, and many applications, services and sites will use this data for searching, analyzing and processing. And it will be all over the place.

Sentence


If you liked the project, then I am happy to be ready for all sorts of cooperation:

I would be glad to cooperate both on a commercial and non-commercial basis with people who can promote such services, Java or JavaScript programmers, designers, testers and any other help.

I repeat, because upon request I see that they are looking for everything - you can only look for drugs and pharmacies, gas stations with gasoline prices, car washes and their services, ATMs, payment terminals, hotels, cafes, restaurants, food (from the menu), swimming pools, ticket offices and tickets . Only in Moscow.

Source: https://habr.com/ru/post/123448/


All Articles