📜 ⬆️ ⬇️

"Autoswitch layouts" in php-applications

Good day!

Recently, I have been developing a system of chat rooms, so to say chat rooms 3.0. This idea came after he met with such an interesting and convenient thing as a comet-server Realplexor from dkLab. But now is not about that ...

The main engine wrote, the chat works, users communicate, everything seems to be nothing, but there is one BUT! Yet we are people and sometimes forget to switch the keyboard layout from English to Russian, it happens. After writing the n-th number of words, we send the message and what we see - it was written in the wrong layout, and rarely anyone will want to rewrite this text, and readers will not want to translate all your scribbles. And it was decided to come up with some very simple way to correct such messages.

')
And so, I sit watching a movie, second, third - and the thought comes, to collect a certain number of frequently encountered words of the Russian language. I got into Wikipedia and surprisingly found there information on a subject of interest to me.

Only half of the work was done to create an algorithm for the work and write a function.

The algorithm is very simple:
  1. We translate a string to lower case
  2. Remove unnecessary characters
  3. We break it into words
  4. We are looking for matches of words found in Wikipedia, and remember their number
  5. If the number of matches is greater than or equal to the limit specified by us at which it is assumed that the sentence is not written in the correct layout, change the letters and symbols in the sentence to the correct ones.


Simply! Is not it?

The function in php looks like this:

function orfFilter($string){ /*-               */ $countErrorWords = 1; /*   */ $countError = 0; /*   ,       */ $errorWords = array('b', 'd', 'yt', 'jy', 'yf', 'z', 'xnj', 'c', 'cj', 'njn', ',snm', 'f', 'dtcm', "'nj", 'rfr', 'jyf', 'gj', 'yj', 'jyb', 'r', 'e', 'ns', 'bp', 'pf', 'ds', 'nfr', ';t', 'jn', 'crfpfnm',"'njn", 'rjnjhsq', 'vjxm', 'xtkjdtr', 'j', 'jlby', 'tot', ',s', 'nfrjq', 'njkmrj', 'ct,z', 'cdjt', 'rfrjq', 'rjulf', 'e;t', 'lkz', 'djn', 'rnj', 'lf', 'ujdjhbnm', 'ujl', 'pyfnm', 'vjq', 'lj', 'bkb', 'tckb', 'dhtvz', 'herf', 'ytn', 'cfvsq', 'yb', 'cnfnm', ',jkmijq', 'lf;t', 'lheujq', 'yfi', 'cdjq', 'ye', 'gjl', 'ult', 'ltkj', 'tcnm', 'cfv', 'hfp', 'xnj,s', 'ldf', 'nfv', 'xtv', 'ukfp', ';bpym', 'gthdsq', 'ltym', 'nenf', 'ybxnj', 'gjnjv', 'jxtym', '[jntnm', 'kb', 'ghb', 'ujkjdf', 'yflj', ',tp', 'dbltnm', 'blnb', 'ntgthm', 'nj;t', 'cnjznm', 'lheu', 'ljv', 'ctqxfc', 'vj;yj', 'gjckt', 'ckjdj', 'pltcm', 'levfnm', 'vtcnj', 'cghjcbnm', 'xthtp', 'kbwj', 'njulf', 'dtlm', '[jhjibq', 'rf;lsq', 'yjdsq', ';bnm', 'ljk;ys', 'cvjnhtnm', 'gjxtve', 'gjnjve', 'cnjhjyf', 'ghjcnj', 'yjuf', 'cbltnm', 'gjyznm', 'bvtnm', 'rjytxysq', 'ltkfnm', 'dlheu', 'yfl', 'dpznm', 'ybrnj', 'cltkfnm', 'ldthm', 'gthtl', 'ye;ysq', 'gjybvfnm', 'rfpfnmcz', 'hf,jnf', 'nhb', 'dfi', 'e;', 'ptvkz', 'rjytw', 'ytcrjkmrj', 'xfc', 'ujkjc', 'ujhjl', 'gjcktlybq', 'gjrf', '[jhjij', 'ghbdtn', 'pljhjdj', 'pljhjdf', 'ntcn', 'yjdjq', 'jr', 'tuj', 'rjt', 'kb,j', 'xnjkb', 'ndj.', 'ndjz', 'nen', 'zcyj', 'gjyznyj', 'x`', 'xt'); /*     */ $delChar = array('!' => '', '&' => '', '?' => '', '/' => ''); /**/ $expectWord = array('.'=>'/me'); /*    */ $arrReplace = array('q'=>'', 'w'=>'', 'e'=>'', 'r'=>'', 't'=>'', 'y'=>'', 'u'=>'', 'i'=>'', 'o'=>'', 'p'=>'', '['=>'', ']'=>'', 'a'=>'', 's'=>'', 'd'=>'', 'f'=>'', 'g'=>'', 'h'=>'', 'j'=>'', 'k'=>'', 'l'=>'', ';'=>'', "'"=>'', 'z'=>'', 'x'=>'', 'c'=>'', 'v'=>'', 'b'=>'', 'n'=>'', 'm'=>'', ','=>'', '.'=>'', '/'=>'.', '`'=>'', 'Q'=>'', 'W'=>'', 'E'=>'', 'R'=>'', 'T'=>'', 'Y'=>'', 'U'=>'', 'I'=>'', 'O'=>'', 'P'=>'', '{'=>'', '}'=>'', 'A'=>'', 'S'=>'', 'D'=>'', 'F'=>'', 'G'=>'', 'H'=>'', 'J'=>'', 'K'=>'', 'L'=>'', ':'=>'', '"'=>'', '|'=>'/', 'Z'=>'', 'X'=>'', 'C'=>'', 'V'=>'', 'B'=>'', 'N'=>'', 'M'=>'', '<'=>'', '>'=>'', '?'=>',', '~'=>'', '@'=>'"', '#'=>'№', '$'=>';', '^'=>':', '&'=>'?'); /*      $arrReplace*/ $arrReplace2 = array_flip($arrReplace); /*  */ unset($arrReplace2['.']); unset($arrReplace2[',']); unset($arrReplace2[';']); unset($arrReplace2['"']); unset($arrReplace2['?']); unset($arrReplace2['/']); /*    */ $arrReplace = array_merge($arrReplace, $arrReplace2); /*   ,      ,    */ $string2 = strtr(trim(strtolower($string)), $delChar); $arrString = explode(" ", $string2); /*         -*/ foreach ($arrString as $val){ if (array_search($val, $errorWords)){ $countError++; } } return ($countError >= $countErrorWords)?strtr(strtr($string ,$arrReplace),$expectWord):$string; } 


Conclusion

Yes, the method is not entirely accurate, or rather called a “crutch”, but very simple. At the moment, for my needs, it is suitable.
In the future, I would like to rewrite the definition of incorrect layouts by endings, etc.

If someone wants to see this approach, you can go to chat

I would be glad if someone helped.

UPD: a man named Sergey Koryakin (sergekoriakin@gmail.com) threw me a couple of links to his translation of this function on JS - two times

Source: https://habr.com/ru/post/140351/


All Articles