Client for Forvo.com service with available tools

I think it is no secret to anyone that foreign words are easier to remember when you know how they are pronounced. Fortunately, there is an excellent Forvo online service for this - the pronunciation of words. This service offers a web interface (as well as api with some limitations, about which a bit later) for accessing the database and listening to words. But every time opening the browser for listening is not very convenient. Therefore, I started looking for a simple forvo client. My requirements were as follows: ease of use, no GUI, easy portability, no requirement to store any settings. But what a bad luck - all attempts to find a similar, simple client for Linux were not successful, which surprised me a lot. After all, the implementation of such a client is, in fact, not too difficult. Thus, I realized that I would have to write the utility myself.

Formulation of the problem

Make the most simple forvo-client, which must meet the requirements specified above;
Must have a simple command line interface:
$say hello world # "hello world"
$say -lng=ru # ( en, ru, tt, etc...)

Tool selection

Of course, there are many approaches to solving this problem. I thought that the best choice would be to use the bash + awk + curl + mpg123 bundle (or some other player). So before we put the right packages, for example for Debian-based systems:
')
$sudo apt-get install gawk curl mpg123

Decision

Looking ahead - I did not use forvo-api, I will explain the reasons at the end of the article.

When exploring the forvo search page, you will notice that the form is submitted with the following POST request:

     params:
         id_lang = $ LANGUAGE_ID, where LANGUAGE_ID is the pronunciation language identifier
         word_search = $ WORD, where WORD is the search word
     post-url:
         http://www.forvo.com/search/ - address of the request

Thus, we can implement this request by calling:

 #!/bin/bash LANGUAGE_ID=39 #id   (        id_lang) WORD="hello world" curl -d "id_lang=$LANGUAGE_ID&word_search=$WORD" -L 'http://www.forvo.com/search/'

The answer to this request comes to us in the form of html, the body of which contains links (we need the very first) which contain the url of the audio stream of the pronunciation of the desired word. Thus it is necessary to implement a parser that retrieves the url of the audio stream. The awk implementation:

 # # parser.awk # /var (_SERVER_HOST|_AUDIO_HTTP_HOST)/{ if(match($0, /var[ \t]+(_SERVER_HOST|_AUDIO_HTTP_HOST)[ \t]*=[ \t]*'?([^']+)'?/, arr)){ if(arr[1] == "_SERVER_HOST"){ srv_host = arr[2]; } else if(arr[1] == "_AUDIO_HTTP_HOST") { audio_http_host = arr[2]; } } } /<a href.+onclick="Play\(/{ if(match($0, /onclick="Play\([^,]+,'([^,]+)'.+\)/, arr)){ mp3Path = arr[1]; if (srv_host == audio_http_host){ mp3Path = ("http://" srv_host "/player-mp3Handler.php?path=" mp3Path); } else { mp3Path = ("http://" audio_http_host "/mp3/" base64_decode(mp3Path)); } } exit; } function base64_decode(val){ command = ("echo '" val "' | base64 -d"); command | getline ret; close(command); return ret; } END{ if(mp3Path) print mp3Path; }

Having received the url of the audio stream, we reproduce it with the help of the mpg123 micro player. There may be a reasonable question: why mpg123 , and not another player? Hmm ... when choosing a player, I was looking for the most minimalist player that can play streaming audio.
Thus, the main script will look like this:

 # # say # LANGUAGE_ID=39 WORD=$@ if [[ -n $WORD ]]; then URL=$(curl -d "id_lang=$LANGUAGE_ID&word_search=$WORD" -L 'http://www.forvo.com/search/' 2> /dev/null | awk -f ${0%/*}/parser.awk) if [[ -n $URL ]]; then mpg123 -q $URL else echo not found fi fi

But here comes the first problem: we ended up with two files ( say and parser.awk ), which is not very good for such a small utility. I would like this utility to be presented in one file. Hence the question: how to combine two disparate programs written in shell (bash) and awk ?

Option 1

Use the standard awk feature to enclose the program in quotes and pass it as a command line parameter:

 # # examlpe1.sh # echo "from shell script" AWK_PRG="BEGIN{ print \"from awk program\" }" awk "$AWK_PRG"

This approach is good for single-line awk programs. If the program is a little more than one-liner, there may be difficulties with screening quotes (both single and double). The screening, in turn, leads to the “littering” of the program itself, and the complication of its support and expansion. So this approach is not suitable in this case.

Option 2

Trickiness. For a start, think about it. Take into account that shell scripts are interpretable, i.e. The script is executed by command (or line by line). Thus, a thought arises: what if you put the awk program at the very end of the shell script and put the exit command before it so that the bash interpreter, after executing the whole shell script, does not start reading the awk program. So, we managed to combine the shell script with the awk program. But how now, this awk program in the tail of the file to read and execute? The answer suggests itself - use awk ) Ie we just need to mark with some marker (for example, a comment) the end of the shell script and the beginning of the awk program and give this file for processing to another awk program that will read everything after the marker:

 # # examlpe2.sh # echo "this is shell script" AWK_PRG=$(awk '(/^### AWK PROGRAMM MARKER ###$/ || body){body=1; print $0}' $0) awk "$AWK_PRG" exit ### AWK PROGRAMM MARKER ### BEGIN{ print "from awk program" }

this approach allows, without any changes, to include the awk code (and not only) of the programs in the shell scripts. In the repository, you can find the implementation of the getAwkProgram function, which allows you to name and load by name, integrated into the shell script, awk programs. I decided not to bring this function here, as I think it would distract from the main topic.

Option 3

Thanks to xaizek for reminding me of yet another method for integrating awk programs into shell scripts:

 # # examlpe1.sh # echo "from shell script" AWK_PRG=$(cat << 'EOL' BEGIN{ print "from awk program" } EOL ) awk "$AWK_PRG"

This method is based on heredoc syntax . Although this approach is more natural (from the point of view of bash ) and is undoubtedly better than option # 1 (inline programs), but I still find it less readable than option # 2.

Thus, using the second approach, now our forvo-client easily fits in one file:

 #!/bin/bash LANGUAGE_ID=39 #english # Trick for mixing AWK and Shell programs in the same file PARSER_PRG=$(awk '(/^### AWK PROGRAMM MARKER ###$/ || body){body=1; print $0}' $0) WORD=$@ if [[ -n $WORD ]]; then URL=$(curl -d "id_lang=$LANGUAGE_ID&word_search=$WORD" -L 'http://www.forvo.com/search/' 2> /dev/null | awk "$PARSER_PRG") if [[ -n $URL ]]; then mpg123 -q $URL else echo not found fi fi exit ### AWK PROGRAMM MARKER ### # parser /var (_SERVER_HOST|_AUDIO_HTTP_HOST)/{ if(match($0, /var[ \t]+(_SERVER_HOST|_AUDIO_HTTP_HOST)[ \t]*=[ \t]*'?([^']+)'?/, arr)){ if(arr[1] == "_SERVER_HOST"){ srv_host = arr[2]; } else if(arr[1] == "_AUDIO_HTTP_HOST") { audio_http_host = arr[2]; } } } /<a href.+onclick="Play\(/{ if(match($0, /onclick="Play\([^,]+,'([^,]+)'.+\)/, arr)){ mp3Path = arr[1]; if (srv_host == audio_http_host){ mp3Path = ("http://" srv_host "/player-mp3Handler.php?path=" mp3Path); } else { mp3Path = ("http://" audio_http_host "/mp3/" base64_decode(mp3Path)); } } exit; } function base64_decode(val){ command = ("echo '" val "' | base64 -d"); command | getline ret; close(command); return ret; } END{ if(mp3Path) print mp3Path; }

findings

Here are the pros and cons of the approach described here, and the approach that uses forvo-api

Current Approach:
+ no need to have an account on forvo.com
+ no need to store and transfer forvo-api keys
- client performance depends on the site design (i.e., if global changes are made on fovro, then the parser will have to be fixed)
forvo-api approach:
+ ease of client implementation
+ theoretically less incoming traffic for each request
- the need to have a forvo.com account (for obtaining a forvo-api key)
- the need to carry a forvo-api key

It is worth noting another little thing - for some reason, for me, mpg123 refused to accept the link received via forvo-api request.

Conclusion

Since the purpose of the article was to show a possible method for solving this problem, I decided to give here a basic implementation of the client (without the possibility of a persistent switching of the pronunciation language). A more complete client version is available at github.com .

Afterword

On Habré more than once published useful posts that somehow dealt with the topic of foreign languages. Relatively recently, I ran a post from a sandbox in which I liked the idea of creating a user dictionary. As well as a post in which, the idea of tying a key combination to translate selected words / phrases was proposed. By combining the ideas of these articles, we can recommend the following scheme:

custom dictionary replenish and use with Anki
by analogy with the second post, assign a key combination to the forvo client

Now, when you meet an unfamiliar word, you can learn its translation and how it is pronounced with a single keystroke. After that, be sure to add it to your personal dictionary with transcription. I use a similar scheme, only the translation of words I look for a browser plugin.

Source: https://habr.com/ru/post/147842/

All Articles