
Skyeng shares with Habr a link to the internal application that our methodologists use.
We at Skyeng are convinced that the faster a student gets a tangible effect from a lesson or a workout, the higher his motivation and the more effective the learning itself. The traditional method of learning languages ​​promises a concrete result only after a long time - a year, two, i.e. requires the investment of considerable effort, time and money without immediate effect. We believe that it is realistic to get a “return on investment” quickly, if we set ourselves small specific tasks and solve them. Today we will tell about one of our service tools, designed specifically for this, and give readers the opportunity to try it in, make their own lists of words, the most interesting of which will be offered to all Aword users!
If you need to cook Irish stew according to the original recipe in English, the traditional school will offer to learn 200 names of kitchen utensils and 300 names of various products. We suggest to immediately learn words that are directly related to the task - i.e. found in recipes just Irish stew. A design engineer for reading professional literature does not have to go through the lessons about “London from the Capital” and ecology: knowledge of basic and highly specialized vocabulary is enough for him.
')
To solve such specific problems, we are preparing thematic sets of words that users of our mobile application Aword can learn. And to prepare these sets, we use the Wordset Generator tool, which creates an ordered list of words to memorize from a text or a set of texts that a student wants to read.
The result of processing the book by Douglas Adams “Hitchhiker's Guide to the Galaxy”
Words found in 5 seasons of the game of thrones, superimposed on the model curve of student knowledge. The coordinates of each point (word) - the utility of the number of the word. The right shows the most useful for such a student 25 words from the series.The creation of the Wordset Generator was made possible thanks to our word ranking and vocabulary definition tools for a particular student (in
one of the previous articles we told why we made these tools and did not use ready-made corpuses). For each word, the effective utility can be calculated: how much the study of this word will increase the coefficient of understanding of the text. With the help of the Wordset Generator, we can recommend the student to study first of all the most common unknown words or, on the contrary, the most important in his professional activity.
Algorithm
- A list of all the words used in the text, with the number of entries.
- All words missing in our dictionary are cut off (sent to a separate list). As a rule, these are invented by the author words, names, names.
- Determined the "thematicity" of each word in the list, for which the frequency of the word in the analyzed text is compared with the frequency of the word in the corpus of the English language (its prevalence). The number means how many times more often the word is present in the analyzed text.
Next is a semi-automatic adjustment of the list for specific needs (using the specified parameters or moving the sliders).
- Sets the level of knowledge of the student ("complexity"). This cut off the words with which the student is likely already familiar.
- Chosen weights of thematic and local frequency. Thematics is important if we prepare a list of professional terms for use at work. In the case of the analysis of fiction is more important frequency.
- Finally, the algorithm is able to calculate the probability that a particular word in this text is a proper name (in the web version, such words are highlighted with different intensities in red). The slider "Proper Names" allows you to delete such words in accordance with a given probability; in most cases, manual intervention is required, especially when it comes to fiction.
Not only car
The Wordset Generator tool greatly facilitated the work of our content department, but, of course, did not take it upon itself. Methodists continue to play an important role in making thematic sets of words to learn.
First, they need to prepare a corpus of texts from which words will be extracted. If with a particular book or movie this task is more or less simple, then in the case of thematic sets such as “At the airport”, you need to dig through a rather large amount of information to get a good representative sample: classic texts from textbooks, articles from guidebooks, rules of airlines, reviews in blogs (usually complaints), etc.
It is important that these texts are modern and lively, because we want to teach students the language that Americans and the British speak and write today.Secondly, it is necessary to adjust the correct parameters of complexity, thematic and other. All this is done only by manually dragging the sliders, since it strongly depends on the purpose of recruitment, the level of training of the student, the specifics of the topic, etc.
Thirdly, it requires serious work with the resulting set of words. It is necessary to find out the exact meaning of the word in this context. In addition, often the necessary term consists not of one word, but of several, they must also be found and the list put in order. So, in the case of airport vocabulary, we found the word metal among the most frequently encountered: in fact, it was a metal detector. Such phrases often consist of simple words that the tool discards - they must be found and put back in place.
Finally, one must also select a picture for all words - so that they correspond to the necessary meaning. This is also a special person.
Application
The most obvious use of the Wordset Generator tool for our students is the creation of word lists to learn for specific books or movies. If you analyze the text of the book, make a list of hundreds of words and teach it in a mobile application - it will be much easier to read, you don’t have to search the dictionary every five minutes.
Thanks to the tool, we can quickly prepare word sets for a specific event: the presentation of the next iPhone, a football championship, a loud premiere, or some kind of media scandal. Our students can contact us with such a request, and we ourselves try to keep track of potentially relevant “perishable” topics in order to promptly suggest a set of words for users of the mobile application.

Fictional analysis helps methodologists prepare recommendation lists for each level of students. The less “complex” words the program produces, the more accessible the text is for students who are in the middle of a language learning path. For high levels, such texts do not present difficulties and do not bring educational benefits - they need to look for richer lexically works. For example, in the randomly chosen detective Agatha Christie (After the Funeral) "difficult" words, there are less than 300; in James Joyce's Ulysses, the list comes in over 2000.
The Wordset Generator tool is very useful in our work with corporate clients who often need to study and memorize special professional vocabulary. So, for one of the corporate clients working in the aerospace industry, we prepared word lists based on the analysis of dozens of articles in professional journals. It is important that in high-tech areas the vocabulary is constantly updated; Using our tool and selection of the most fresh materials allows you to create lists containing the most relevant terms.
To the point!
We decided to give Habr readers the opportunity to play around with the Wordset Generator - here it is:
http://tools.skyeng.ru/sandbox/wordset-generator/It is more or less intuitive, although it should be borne in mind that this is our internal tool, not intended for the general public, and therefore its interface is very ascetic and unkempt.
In the open version there is a limit on the size of the text - no more than 80 thousand characters, including spaces and line breaks. Practice shows that this is the optimal value for the useful application of the tool "in everyday life". Take what you are about to read in the near future: a couple of chapters, ten pages, or several articles. You will receive a compact set that can be trained in the mobile application during the day, and in the evening you can consolidate what you learned in context (while enjoying the book). For example:

in front of you is the result of parsing the first chapter of “Hitchhiker's Guide to the Galaxy” by Adams. Compare with the screenshot at the beginning of the article, which shows the result of the analysis of the entire book with the same parameters. These words are also there, but somewhere in the third or fourth hundred, and here they are presented as on a silver platter.
The resulting words can be added to the application manually using the built-in dictionary. And Habr's readers can create their own list of words, export it to CSV and share the link to the resulting file in the comments to this post. In a week we will choose the most interesting sets proposed by Habr and include them in our application in the special category “Sets from habrovchan”.
The Aword application itself can be obtained from the
App Store . Soon it will be available on Google Play, and in November - in the Web version!
Have a good learning words!
And traditionally, we remind you that we will be happy to see
valuable specialists in our team!