I sometimes have to answer this question (see subject), since I am working on an alternative local search service. Google very vaguely describes where the data comes from. The main sources for the article were our own observations, and this patent application .
The main misconception is that
"Google Maps finds information about companies on the Internet .
" This is not quite true. Information about your company may be on hundreds of indexed web pages, but never get to the issue on Google Maps.
')
Unlike web search, which searches by index of cached web pages, Google Maps contains a
structured directory of businesses . Each company record contains a key-value field with data that is understandable to the machine. This should allow you to find
"a restaurant with a vegetarian menu and pre-order within a radius of 10km from Kiev station" , but more often the catalog contains exact values only for the address and telephone number.
Therefore, it is important not how Google searches for its own directory, but where the information is drawn from.
Location data in the Google Maps directory
According to Google, the directory "combines information from different sources to give the best result." Sources are divided into two groups:
Structured and
semi-structured are data sources that can be easily brought to the key-meaning that the program can understand. Usually this:
- commercial base of enterprises that are bought
- websites containing large company directories ; data from these sites is collected by an individual crawler, which uses regular pages to extract information from the pages of the catalog
- Google Local Business Center where business owners fill in the information themselves.
- KML (and similar) files that are used to display points using the Google Maps API
- custom cards
Unstructured - these are indexed web sites that may contain information about the company, but the data from them can not be structured.
How information is structured
This process can be described in three basic steps:
- Key-value data comes from several structured sources.
- The enterprise data is clustered: values from different sources are compared and accuracy and weight are determined for each.
- Structured data is complemented by unstructured *
*
Structured data usually contain accurate but scant information about the enterprise. And this makes it difficult:
- search ; How to find "private kindergarten" if the directory of enterprises does not contain a field on the form of ownership?
- ranking ; How to determine which “pharmacy” should be the first to be issued, if all the data from one directory?
Therefore, when the main fields for the company (name, address, telephone number) are defined, a web search is performed on request:
_+_
and the pages found (and most importantly the keywords from the pages found) are associated with the company data.
How it doesn't work
There are a number of examples where the algorithm leads to erroneous results.
The reason : the hostel-association sites constantly host lists of embassies and consulates. The consular department was cataloged from one of the structured sources but was associated with the site hihostels.com.ua
The reason: sites for renting real estate host the lists of utilities. ZHEK got into the Google catalog from one of the bases of enterprises, but was associated with the site toprealty.org.ua
What to do so that the company got into the issue of Google Maps
Obviously, how many b would not contain information about the company on the web, the most important thing is for this information to go to one (and better to several) structured sources. The problem is that Google does not provide a list of databases and directories from which information is taken. The only known place is Google LBC.
Total
Google Maps is not as transparent as Google Web Search:
- Most users are not aware of how to search for Google Maps.
- Often it is impossible to determine the source of information
- Sometimes the result is not consistent with the principle of "least surprise"
I think Google could do better.
I would be grateful for corrections, additions and comments.
Sources
Generating structured information (patent application US 2006/0200478 A1)
Google's Local Search Patent Application (at SEO by the Sea)
Local listings: Where do they come from?