📜 ⬆️ ⬇️

Analysis of large semantic cores, or "Robot Recognizer"

They say that meta tags are dead for SEO purposes and there is no longer any reason to prescribe the cherished meta keywords string. Indeed, in modern multi-factor algorithms, this tag has lost its weight. But this does not mean at all that it is necessary to refuse to work with the semantic core of the site - it is still invaluable for structuring the site, forming topics (which search engines are interested in) and even for contextual advertising. Assembling a core is not an easy task, assembling it wisely and not turning it into “throw me synonyms” is even more difficult. So, the article will deal with macros and MS Excel formulas, which will simplify the processing of large semantic cores. We present you a small Excel-robot from our resourceful and not greedy specialist of the RealWeb contextual advertising department Dmitry Tumaykin. Him and the word.
image

“Hi, Habr! From the very beginning I want to warn readers that the development is not something new on the market in terms of logic and principle of operation, but it has several advantages over other well-known tools:


The file is one of a series of files for contextual advertising specialists (more interesting and very useful “robots” are planned). By the way, the development was created from scratch, and already after the fact it became known that there are already similar macros for table editors on the market - for example, the Devaka publication (macro for OpenOffice) and the development of the MFC-team , which is an adaptation of this macro MS Excel. However, none of the files could satisfy my needs as the needs of a contextual advertising specialist. Our development has its advantages, as described below.

In the meantime, why do you need it?


Suppose there is a large semantic core. How it was received, we are not interested. In essence, this is an unstructured set of queries that we would like to structure. How do we do it manually?

  1. Having run through the requests with our eyes, we try to understand the general semantic focus, we single out which categories of requests are present in the core.

  2. We define and highlight marker words that will allow us to assign requests to a particular category. The most primitive variant is “target-non-target”, but it is possible to single out the direction of words. For example, for companies working with e-commerce, wholesale (wholesale, base, warehouse, wholesale ...), information (how, where, difference, comparison, reviews ...) buying (store, buy, price, cost ...) will be popular. , rental (rental, rental, rental, rental ...), geo-markers, etc ...

  3. We search for each of the requests in the source list with the help of filters, mark these requests. It all depends on the imagination - if we have only two options, then it is enough just to mark the requests with different colors. If there are a lot of options to avoid confusion, we could mark opposite the queries in the adjacent column in the same column the name of the same category to which the marker word contained in the query belongs. And so that everything is even more obvious, you can indicate in the additional column the word marker itself. By the end I came to this.

  4. After everything is laid out “on the shelves” in this way - we use the resulting lists for our own purposes. Experts contextual advertising expose adjustments rates depending on the category of the request (for example, the “warmer” the request, the higher the rate), or “minus” completely untargeted requests.

Being sincerely believing that laziness is the engine of progress, and having written more than one macro for this reason, I began to look for tools that would allow the entire process to be automated as much as possible. Among the options I have not found one that fully satisfies my requirements. And that's why.

KeyCollector and SlovoEB are multifunctional programs, one of the tasks of which is to collect the semantic core. The first is paid, the second is a stripped-down free version. They have a module "stop words", allowing you to mark phrases in the table containing the data stop words.
Cons of these programs:


http://py7.ru/tools/group/ - tools appeared in the public domain quite recently, but gained serious popularity. The mechanics of this tool is slightly different from KeyCollector, but the problems are the same - errors with word forms, the inability to analyze several categories at the same time.

The aforementioned Devaka and MFC macros. The main problem here is that both macros use a mask search algorithm, i.e. if the word marker is part of any of the words, the entire query falls into this category. As a result, the shorter the word marker is, the more mistakes, with the words B and CU, it is impossible to work a priori, but the upper threshold for characters of even 6 characters does not save you from having to recheck everything that has been done. For example, the word "bathroom" is found in hundreds of verb adjectives (lacquered, corrugated ...). It is clear that in many cases there is no question of relevance.

And that is why ...


... I created my own file with blackjack formulas and macros. Without further ado, he simply analyzes whether a word occurs in exact correspondence in your query, so there are no mistakes a priori. Many formulas (Substring, Multicat) are not written from scratch, but are imported from the PLEX add-in (for which special thanks to its creator ).

Now more about the development, comments on the screenshot:



File Link: Robot Recognizer
In the file 2 macros, operating on a simple principle - they are:


An example of a formula that does all this in a single cell is stored in the top row, from where you can simply copy it manually. There is a macro that will do this for you by placing the new data in the range exactly opposite your list of keys (there should not be empty cells in it!). However, this option is more resource intensive, and will make you wait long enough if the number of keywords and the bank minus words is, say, more than 5000. Therefore, a second macro has been created that does the same thing step by step and discards unnecessary calculations along the way.

Recommendations:


Here is a story. Take a file, create semantic kernels, use them in contextual advertising and SEO, make your site better, share your experience with others.

PS: Wishes and bug reports are welcome.
To date, the file uses a linear search for CDF. Later I will lay out the version of the file with binary search, the increase in the speed of calculations on large volumes will be enormous. The list of negative words specified in the file is not a recommendation and can harm you and / or your client, therefore I urge you to check public lists (including this one) for compliance with the subject matter of your client, and supplement it with new words. Formulas and macros work only in the original file, and will not work in others. Using macros in Excel does not harm your health. "

This is the end of the story of Dmitry about creating a useful macro. And we remind that sometimes the most difficult tasks have simple solutions. MS Excel has always been and remains the chief assistant of analysts, specialists in contextual advertising and SEO-optimizers. Its functions, formulas and macros can generate interesting tools that facilitate the work of specialists. And, of course, it will be very nice if some of the developments will be laid out in open access and bring real practical benefits.

Source: https://habr.com/ru/post/264591/


All Articles