📜 ⬆️ ⬇️

Application of fuzzy search algorithms in PHP

Inspired by topics about fuzzy search and phonetic algorithms , I wanted to try to implement something similar to Google’s “Perhaps you meant: ...” using PHP.

To correct typos in words, you will need:
Levenshtein distance (or Damerau-Levenshtein distance - the difference will be insignificant) - levenshtein ()
Metaphone - Metaphone ()
Oliver algorithm - similar_text ()
Base of Russian words (with cases, taking into account the times, etc.).

Function for transliteration of words:

function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  1. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  2. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  3. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  4. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  5. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  6. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  7. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  8. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  9. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  10. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  11. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  12. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  13. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  14. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  15. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  16. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  17. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  18. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
  19. function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .
function translitIt($str) { $tr = array( "" => "A" , "" => "B" , "" => "V" , "" => "G" , "" => "D" , "" => "E" , "" => "J" , "" => "Z" , "" => "I" , "" => "Y" , "" => "K" , "" => "L" , "" => "M" , "" => "N" , "" => "O" , "" => "P" , "" => "R" , "" => "S" , "" => "T" , "" => "U" , "" => "F" , "" => "H" , "" => "TS" , "" => "CH" , "" => "SH" , "" => "SCH" , "" => "" , "" => "YI" , "" => "" , "" => "E" , "" => "YU" , "" => "YA" , "" => "a" , "" => "b" , "" => "v" , "" => "g" , "" => "d" , "" => "e" , "" => "j" , "" => "z" , "" => "i" , "" => "y" , "" => "k" , "" => "l" , "" => "m" , "" => "n" , "" => "o" , "" => "p" , "" => "r" , "" => "s" , "" => "t" , "" => "u" , "" => "f" , "" => "h" , "" => "ts" , "" => "ch" , "" => "sh" , "" => "sch" , "" => "y" , "" => "yi" , "" => "'" , "" => "e" , "" => "yu" , "" => "ya" ); return strtr($str,$tr); } * This source code was highlighted with Source Code Highlighter .

')
So, first we will get the entire dictionary from the database and write it into an array in pairs, where the key is the Russian word, the meaning is transliteration.

  1. $ query = "SELECT ru_words FROM word_list" ;
  2. if ($ stmt = $ this -> conn-> prepare ($ query))
  3. {
  4. $ stmt-> execute ();
  5. $ stmt-> bind_result ($ ru_word);
  6. while ($ stmt-> fetch ())
  7. {
  8. $ word_translit [$ ru_word] = translitIt ($ ru_word);
  9. }
  10. }
* This source code was highlighted with Source Code Highlighter .


Next, we check our entered word for availability in the dictionary, if not - we make its transliteration:

  1. if (isset ($ word_list [$ myWord]))
  2. {
  3. $ correct []. = $ myWord;
  4. }
  5. else
  6. {
  7. $ myWord = $ this -> translitIt ($ myWord);
* This source code was highlighted with Source Code Highlighter .


After this, we start a cycle that will select from the array those words whose Levenshtein distance between "metaphones" will not exceed half of the "metaphone" of the entered word (roughly speaking, up to half of the incorrectly written consonant letters are allowed), then, among the selected options, we check again distance, but on the whole word, not on its “metaphone” and the words that came up are written in the array:

  1. foreach ($ word_translit as $ n => $ k)
  2. {
  3. if (levenshtein (metaphone ($ myWord), metaphone ($ k)) <mb_strlen (metaphone ($ myWord)) / 2)
  4. {
  5. if (levenshtein ($ myWord, $ k) <mb_strlen ($ myWord) / 2)
  6. {
  7. $ possibleWord [$ n] = $ k;
  8. }
  9. }
  10. }
* This source code was highlighted with Source Code Highlighter .


Now we will define variables, where the Levenshtein distance will be equal to a deliberately large number, and “similar text” will be a obviously small number.

  1. $ similarity = 0;
  2. $ meta_similarity = 0;
  3. $ min_levenshtein = 1000;
  4. $ meta_min_levenshtein = 1000;
* This source code was highlighted with Source Code Highlighter .


This is necessary to determine the maximum value of “similarity” between our word and words in the array, as well as the minimum Levenshtein distance. First we find the minimum Levenshtein distance:

  1. foreach ($ possibleWord as $ n)
  2. {
  3. $ min_levenshtein = min ($ min_levenshtein, levenshtein ($ n, $ myWord));
  4. }
* This source code was highlighted with Source Code Highlighter .


And, similarly, we look for the maximum value of “similarity” for those words in which the Levenshtein distance will be minimal:
  1. foreach ($ possibleWord as $ n)
  2. {
  3. if (levenshtein ($ k, $ myWord) == $ min_levenshtein)
  4. {
  5. $ similarity = max ($ similarity, similar_text ($ n, $ myWord));
  6. }
  7. }
* This source code was highlighted with Source Code Highlighter .


Now we start the cycle, which will select all the words with the smallest Levenshtein distance and the highest value of “similarity” at the same time:

  1. foreach ($ possibleWord as $ n => $ k)
  2. {
  3. if (levenshtein ($ k, $ myWord) <= $ min_levenshtein)
  4. {
  5. if (similar_text ($ k, $ myWord)> = $ similarity)
  6. {
  7. $ result [$ n] = $ k;
  8. }
  9. }
  10. }
* This source code was highlighted with Source Code Highlighter .


After that, we determine the maximum value of "similarity" between the "metaphones" of our word and words in the array, and the minimum Levenshtein distance:

  1. foreach ($ result as $ n)
  2. {
  3. $ meta_min_levenshtein = min ($ meta_min_levenshtein, levenshtein (metaphone ($ n), metaphone ($ myWord)));
  4. }
  5. foreach ($ result as $ n)
  6. {
  7. if (levenshtein ($ k, $ myWord) == $ meta_min_levenshtein)
  8. {
  9. $ meta_similarity = max ($ meta_similarity, similar_text (metaphone ($ n), metaphone ($ myWord)));
  10. }
  11. }
* This source code was highlighted with Source Code Highlighter .


And we get the final array, which, ideally, should contain one word:

  1. foreach ($ result as $ n => $ k)
  2. {
  3. if (levenshtein (metaphone ($ k), metaphone ($ myWord)) <= $ meta_min_levenshtein)
  4. {
  5. if (similar_text (metaphone ($ k), metaphone ($ myWord))> = $ meta_similarity
  6. {
  7. $ meta_result [$ n] = $ k;
  8. }
  9. }
  10. }
* This source code was highlighted with Source Code Highlighter .


And return the correct word that is stored as a key:

  1. return key ($ meta_result);
* This source code was highlighted with Source Code Highlighter .


A plus:

The accuracy of the definition of the word is quite high, even considering that I used a dictionary for 100,000 words, which includes only the zero form and there are too many words in the list that are used very rarely (more precisely, which I first hear about). It certainly spoils the result.

Result:


The problem with words in which the same Levenshtein distance and the meaning of “similarity” both in a pure word and in its “metaphone” can most likely be solved only by adding the frequency of using words.

Minus:

Low speed:

It was tested on: C2D E6550 (2.33GHz), 4Gb (DDR2-800).
I think that this can be partially solved by pulling out from the database only those words that differ in length from the entered by 1-2 characters.

I will be glad to hear from the community about more rational options for using phonetic algorithms, or ideas for improving this method.

References:

Here you can download all the code in one class.
And here is the base of Russian words that I used for the tests.
We thank the user Karroplan for the excellent base , which contains 4,588,867 words and word forms.

Source: https://habr.com/ru/post/115394/


All Articles