📜 ⬆️ ⬇️

We use search engines to verify the correctness of phrases.

At work, I often have to correspond in English, and due to the inept perfectionism, use offline / online translation / explanatory dictionaries for this. In general, they cope with their work until it comes to checking the correctness of phrases or whole phrases. You want to screw something from the category of advanced language proficiency, but you are not sure that you remember correctly (big hello to prepositions and phrasal verbs).

There are a couple of resources for searching phrases, but they are mostly sharpened by commonly used phrases, proverbs and idioms in a single language. In addition, it is not known whether people use the required phrase or if you use it, you even confuse a native speaker.
To solve the problem, I used to use google. The method is simple to indecency: we are looking for the entire phrase entirely (for those who do not know - the phrase for this needs to be enclosed in double quotes), we get, as always, a set of links + a big bonus in the form of a heap of advertising the number of pages found. This is the number that interests us. If the number of "hits" is suspiciously small, rephrase and / or correct errors. Looking for again. Usually in 2-3 iterations a normal result is obtained.

Another couple of advantages of the method:
+ does not depend on the language used. So when “stuck” (the cost of Internet freedom - sometimes you just forget how it was in the “great and mighty”), I also use it;
+ a representative sample of well oooooooooooooochen large number of indexed pages, and therefore the language of "live" and relevant.
')
There are several factors that affect what result is considered normal:
- the length of the phrase;
- prevalence in the language;
- the total number of resources on the Internet in this language (it is obvious that there are an order of magnitude less sites in German than in Spanish, etc.).
In general, everything is quite obvious, you just need a little hand. For example, for common phrases in English, the result should be calculated at least a hundred thousand (why not with millions I will explain just below).

Disadvantages of the method (when using browsers):
- hints and browsing history in browsers sometimes cause dental gnashing. Then you accidentally choose another phrase from those proposed, then you struggle with quotes. Trifle, and sometimes annoying;
- you need to keep the browser open, which is often harmful for “getting thing done,” that is, distracts from work. Either launch the browser (with all the 100+ tabs opened since the year before last. A joke, of course, but not far from the truth. If Firefox doesn’t try to load them all at launch, it also plays out a lot of plug-ins that are nice to install and so sorry to delete).

In order to combat the costs of the method, a console program was written on Python (2.7), which searches for phrases using the Google and Bing search engines. Usage example:



A couple of comments:
- I got a little carried away and screwed up the search in Bing, although it is redundant. Clean yourself if the delay for an additional request interferes. Also for Bing, if you want to use the program source, you need to get a subscription to use the Bing Search API (5000 queries per month for free) on the Windows Azure Marketplace , and then create an Account Key (the name does not matter). The key that is issued by default does not fit (apparently for security reasons, correct me if this is not the case). In the distribution under Windows, the key is naturally already registered, but if it stops working, then the requests for the current month have been exhausted;
- due to the Google AJAX API problems with which the request is executed, the approximate number of "hits" will be very much approximate (what I wrote above - sometimes it differs by an order of magnitude from the number provided when searching using a browser). This issue is known as code.google.com/p/google-ajax-apis/issues/detail?id=32 . There is a suspicion that the Bing Search API behaves as cunningly;
- also, given my difficulties with the conversion of encodings in Python (console, system, query), I could not add support for the Cyrillic alphabet. If someone wants to finish - you are welcome.

The source and archive with the distribution under Windows can be found here .
It is enough to unzip the distribution kit to some folder and add the path to it to the system paths.

I would be grateful for links to similar posts / resources / programs.

UPD: Habrayuser revol0ution prompted a similar post .
UPD: Link to an interesting Google resource for finding words and phrases in books from the user coffeecupwinner .

Source: https://habr.com/ru/post/185860/


All Articles