📜 ⬆️ ⬇️

Yandex Linguistics API for .NET

After visiting Yet another Conference 2013, I had an idea to write an API for all Yandex linguistics services under .NET. After a brief googling, fortunately, there were no such libraries. Despite the fact that it may not be needed by anyone, I decided to implement it at least in order to practice RestSharp , testing, and various github functions (issuers, release, markdown, etc.). In addition, in the implementation process I had to deal with an interesting string comparison algorithm, which I will mention in the topic.

Immediately throw links to sources and binaries on GitHub: Code , Binary

Implemented APIs




RestSharp makes it very easy to write code for synchronous and asynchronous HTTP GET and POST requests, as well as convert the received responses in XML or JSON format to .NET objects (this project used XML).
')

Extended Damerau — Levenshtein distance calculation function


In the process of implementing the speller, I wanted the user to display not only the corrected version of the text, but also errors in it. The thought of Levistein ’s distance immediately came to mind. But:

The first drawback was leveled using the Damerau-Levenshtein distance , and the second - using the matrix analysis obtained during the operation of the algorithm (distance is the value of the last row element in the last column of the matrix. Accordingly, in my case, the distance will be the total number of errors returned ).

Thus, the algorithm was implemented to search for the following errors in erroneous (word) and correct (correctedWord) words:

In addition, the weights of various errors can be adjusted (by default, all have the same weight, equal to one).

The code of the extended Damerau — Levenshtein distance calculation function
public static List<Mistake> DamerauLevenshteinDistance( string word, string correctedWord, bool transposition = true, int substitutionCost = 1, int insertionCost = 1, int deletionCost = 1, int transpositionCost = 1) { int w_length = word.Length; int cw_length = correctedWord.Length; var d = new KeyValuePair<int, CharMistakeType>[w_length + 1, cw_length + 1]; var result = new List<Mistake>(Math.Max(w_length, cw_length)); if (w_length == 0) { for (int i = 0; i < cw_length; i++) result.Add(new Mistake(i, CharMistakeType.Insertion)); return result; } for (int i = 0; i <= w_length; i++) d[i, 0] = new KeyValuePair<int, CharMistakeType>(i, CharMistakeType.None); for (int j = 0; j <= cw_length; j++) d[0, j] = new KeyValuePair<int, CharMistakeType>(j, CharMistakeType.None); for (int i = 1; i <= w_length; i++) { for (int j = 1; j <= cw_length; j++) { bool equal = correctedWord[j - 1] == word[i - 1]; int delCost = d[i - 1, j].Key + deletionCost; int insCost = d[i, j - 1].Key + insertionCost; int subCost = d[i - 1, j - 1].Key; if (!equal) subCost += substitutionCost; int transCost = int.MaxValue; if (transposition && i > 1 && j > 1 && word[i - 1] == correctedWord[j - 2] && word[i - 2] == correctedWord[j - 1]) { transCost = d[i - 2, j - 2].Key; if (!equal) transCost += transpositionCost; } int min = delCost; CharMistakeType mistakeType = CharMistakeType.Deletion; if (insCost < min) { min = insCost; mistakeType = CharMistakeType.Insertion; } if (subCost < min) { min = subCost; mistakeType = equal ? CharMistakeType.None : CharMistakeType.Substitution; } if (transCost < min) { min = transCost; mistakeType = CharMistakeType.Transposition; } d[i, j] = new KeyValuePair<int, CharMistakeType>(min, mistakeType); } } int w_ind = w_length; int cw_ind = cw_length; while (w_ind >= 0 && cw_ind >= 0) { switch (d[w_ind, cw_ind].Value) { case CharMistakeType.None: w_ind--; cw_ind--; break; case CharMistakeType.Substitution: result.Add(new Mistake(cw_ind - 1, CharMistakeType.Substitution)); w_ind--; cw_ind--; break; case CharMistakeType.Deletion: result.Add(new Mistake(cw_ind, CharMistakeType.Deletion)); w_ind--; break; case CharMistakeType.Insertion: result.Add(new Mistake(cw_ind - 1, CharMistakeType.Insertion)); cw_ind--; break; case CharMistakeType.Transposition: result.Add(new Mistake(cw_ind - 2, CharMistakeType.Transposition)); w_ind -= 2; cw_ind -= 2; break; } } if (d[w_length, cw_length].Key > result.Count) { int delMistakesCount = d[w_length, cw_length].Key - result.Count; for (int i = 0; i < delMistakesCount; i++) result.Add(new Mistake(0, CharMistakeType.Deletion)); } result.Reverse(); return result; } 



Interface


The interface was implemented on WinForms with the hope that the application will run on Mono. However, it was not tested.

image

This library can be used in any projects, but with attribution (Apache 2.0).

Source: https://habr.com/ru/post/204372/


All Articles