Starter - Zeesha Currimbhoy, Senior Data Products Engineer at Evernote
After joining Evernote, I first started working on the thinking out and implementation of automatic prompts in the search box in Evernote for Mac.
')
Here is what it looks like now:

Most of us periodically encounter the fact that they cannot precisely formulate a request, finding themselves one on one with a search line and a blinking cursor. To solve this problem, we added dynamic search tips that are offered as you type text, which are formed based on the user’s notes.
In this article I want to touch on some of the features of the implementation of search tips in Evernote.
What is the difference between Evernote search tips and any other?
For many of us, search tips have become an integral part of everyday life. It's not so easy to remember what Google search was like before prompts. Despite the fact that they were widely distributed not so long ago, we are already instinctively awaiting a drop-down list of suitable query options from any search string that we encounter.
So similar in appearance, the tips for different services actually work differently. Google generates options from its continuously growing search engine base. So the user is prompted by options that other people often searched for before him, but these options may not be familiar to him at all. However, this model works well for web search, which should find a wide variety of information around the world.
In the case of the search in Evernote we are talking about our own information. In the Evernote search results, you see the data that you once added, and that is what we use to generate hints. We are dealing with an isolated set of notes and can not supplement them with data from outsiders. Thus, while many other services are talking about the Big Data problem, Evernote faces millions and millions of “small data” problems. Each data set, like each user, is unique.
Another difference is the type of content we use to generate hints. Many services that support search hints offer discrete query options from the final (albeit large) set - these can be previously entered search queries, people, companies, etc. In Evernote, the same hints can be extracted from any content, both structured and and no, which is present in any of your notes and in any language. The system determines the relevance of prompts for the user by analyzing the content of the notes themselves.
Choice of implementation platform
We are working on a cross-platform service and we are trying to make Evernote users equally comfortable to work on all the devices they love. Therefore, we had to make a difficult decision when choosing a platform for implementing prompts at the very beginning, choosing whether to create a server-side system in order to have the potential to immediately use it on all clients, or to start with a native implementation in a single client. We chose the second option and started with Mac for several reasons.
- We wanted search tips, like other search functionality, to be available offline.
- We wanted to take advantage of the platform. Mac OS offers an impressive collection of linguistic APIs that we could use to accomplish a task.
- We wanted to ensure the performance and usability of the user, even if it meant that we had to take more work on ourselves.
Details
The three main components underlying this functionality are an index of terms, creating hints and extracting them.
Index of TermsSearch tips are stored in an inverted index, separate from the main search index. We considered the possibility of using the search index itself, but found that, although theoretically it could be taken as a basis, it was too sharpened for searching for individual keywords, and the output would be very poor quality hints.
So in our index, we linked each term with a list of notes that contain it.
Creating hintsThe mechanism of search prompts starts working even before the user starts typing in the search line. Each time you create, change, or delete notes, we generate a list of potential prompts from the header, tags, and content of the notes, which are then ordered and placed in the inverted index.
It works as follows.
We start by finding the individual words in the text of the note. Words pass through a series of filters that normalize the text (for example, translate to lower case, eliminate accented characters) and remove the stop words (too general). Filtered words can then be further grouped into phrases that may be relevant to a particular text. Finally, the words and phrases that pass the filters are serialized as index entries.
In this step, you also need to take into account the peculiarities of some languages. For example, Chinese and Japanese do not use spaces to separate words from each other. Therefore, it is necessary to use more complex algorithms for finding the boundaries of words. This problem becomes even more interesting (and more difficult), if we consider that the note may contain entries in several languages ​​at once.
And of course, the whole process should take place in the background, using the unused resources of the system at the moment, and not interfere with the user.
Extract TipsWe are ready to search - what is happening now? When a user starts typing text in a search line, the hint extraction system first determines the set of notes that fall into the entered query by context and satisfy the combination of notebooks and tags that the user is looking for. Then, the entered portion of the search query is searched in the index of the prompts to obtain a set of possible terminations of the phrase. Finally, we filter out all terminations that do not fall into context-relevant notes.
Then we evaluate each hint and rank it by relevance using a special formula based on
TF-IDF . The most highly regarded tips reach the final, where very similar tips (for example, ice skate and ice skating) are combined.
In many ways, this component is the most complex, but it must also be the fastest, since it is required to give the user the result in less than a second. Therefore, in terms of performance, we paid special attention to this part of the hint system.
What's next
If you are working with Evernote for Mac and have not tried this feature yet, I would really like you to work with it and share your impressions.
In the future, we will add this functionality to some other Evernote clients.