Linguistic aspects of what3words and technical analysis of dictionaries
I would like to start with thanks! Thank you for your attention and comments to our first welcome post on Habré! Your reaction helped to identify the most interesting questions that we will address in subsequent publications.
As you have rightly noted in the comments, despite the fact that the use of words instead of numbers has a number of indisputable advantages, there are nuances in this approach that must be taken into account. Robert Barr, a professor at the University of Manchester, conducted a technical analysis of what3words and our dictionaries. Below we give the results of its independent evaluation:
While what3words vocabulary seems like a random collection of words, it was carefully designed to achieve specific goals. ')
40 thousand words of the English dictionary used for w3w addresses is enough to index all 3 meters squares by 3 meters with combinations of three words.
Each of the 40 thousand words can be used in each of the three positions of the address w3w, which allows words to be repeated occasionally.
In other languages besides English, 25 thousand words are involved, which are enough to cover all the land with their combinations. English is the only language of which 40 thousand words are involved, which makes it possible to cover both the ocean and the land. The practical consequences of such a decision are that if you have Portuguese in your settings, you will receive combinations of three Portuguese words until you transfer the mark to the sea (probably several hundred meters from the coast), after which the address is displayed in English language.
Dictionaries are optimized so that the “best” words are used for addresses in those areas in which they are likely to be used by speakers of a particular language. “Best” words are short words that are most common in the language. The balance during the spreading of combinations around the world is achieved using two independent ranking systems:
The best words are given to the most densely populated (urban) areas. The next category of words is used for addresses in rural areas, and the least good words are used for seas.
In countries for which a particular language is native or common, the best words from the dictionary of this language are used for addresses. For example, the best words of the French version of w3w are primarily used in France, Senegal and Cameroon, and then spread to other countries.
Avoid the use of homophones, words that are written differently, but sound the same.Only one word is used, or the use of the whole combination is avoided (homophones usually have the same “soundex” code, which is used to match words to avoid errors).The ordering and selection of words for dictionaries is carried out using a multi-step process, which also includes the procedure for eliminating offensive words.
When similar combinations of words appear, they are distributed in such a way that locations with these addresses are unlikely to be located in one country.
Despite the fact that w3w addresses correspond to the style of Internet addresses for locations of three integers, the linguistic aspects of using words instead of numbers have become the subject of careful analysis and optimization.
The w3w system has been optimized in order to simplify the use and memorization of addresses as much as possible, while minimizing possible errors.The only error correction mechanism built into the system checks the likelihood of action.When a w3w address is entered from a device whose current location is known, the distance to the entered address is checked.If the distance is too large, and more than to similar to the sound or writing alternative addresses, the user is offered automatic correction.
By minimizing errors with this correction mechanism, w3w has the potential to become a more reliable replacement for alphanumeric codes.Even when using postal codes in the UK, which have been serving for more than 50 years, people make mistakes when writing in more than 10% of cases.In this case, the indexes are rather checked only for existence, and are not checked for location.