Some time ago we
talked in our blog about the ABBYY department at the Faculty of Innovations and High Technologies at MIPT. This is certainly one of our main points of contact with the younger generation of IT-Schnick, but not the only one. Another student project we want to talk about today. This is a project codenamed ABBYY Labs, the first platform for which was also the PhysTech.

The idea of ​​student laboratories is very simple: we assemble a team of students who are engaged in problem solving under the guidance of our specialists. At MIPT, this takes place as part of the annual course “Innovation Workshop”. The topic that our students work on has been repeatedly raised in the comments on posts about new versions of FineReader. The theme is “sick” for all students, so it is not surprising that this project has become so popular - among the most diverse proposals from companies, 20% of the guys chose it. So, our team is developing a formula recognition module for printed formulas!
')
In our laboratory there are 9 students from different faculties, and now everything is “grown-up” for them. The project was divided into two subprojects - highlighting areas that are “similar” to formulas, and directly recognizing with export to TeX. In each of them there is an analyst and developers - there are three of them in the analysis, and four in the “recognition”, among them there is a leading developer. The role of the project manager is played by the postgraduate student of our department - he not only leads the process, but also helps the children understand the peculiarities of team work on complex technological projects. An HR specialist will assist him with organizational matters. A separate role of testers is not provided - the developers themselves will be engaged in testing. They will write tests for their classes themselves. In addition, the product will be tested on a batch of reference recognized images. So far it has been compiled only for the analysis task, but in the long run the recognition will be tested in the same way.
From the point of view of solving problems, everything is also serious. Despite the fact that in the future product will be used a number of ready-made libraries for working with images in various formats, for text recognition and for binarization of images, children will need:
- create a feature system for generating hypotheses about the presence of a formula in the image, as well as a system for combining and filtering these hypotheses;
- develop a conceptual apparatus for testing formulas (a kind of "semantic dictionary");
- introduce a system of signs and develop standards for characters not supported by the SDK used (after all, formulas are not only Greek and Latin letters);
- invent an algorithm for constructing a formula from recognized characters;
- develop export to TeX.
It’s too early to talk about any results of the young development team. So far they have just begun to go through the “live” software development cycle. We wish the guys to successfully go through all the stages from analyzing the task to “delivering” the finished result and not getting out of the set plans and deadlines. We hope that their experience will inspire other ABBYY Labs teams that will appear in various universities in our country in the future.
Dmitry Gritsan
with the support of HR-service and mobile department