In a few recent posts, a promising and fairly simple way of assessing the quality of product searches using intentions was discussed. We are glad to present an open automated tool for such testing -
Intent-based Search Quality . The idea boils down to the use of previously prepared "focused" requests, the value of which lies in their straightforwardness and unambiguous interpretation.

If you are interested in the approach, be sure to pay attention to the post
“I intend to buy” or the easiest way to assess the quality of grocery search . The transition from standard A / B testing to testing using intentions may not seem entirely obvious.
The essence of the approach
“Focused” queries provide a very simple and reasonably reliable way to validate search results. As a first approximation, by the way, its implementation can be reduced to checking the availability of all keywords from a query in the name of the returned products.
')
Naturally, the position of the product in the final issue is also important. To search in a mobile application, for example, where the first result and several subsequent ones are more critical. For a regular search (the usual desktop) it all comes down to having relevant results on the first page.
How good is a Walmart search?
I intend to buy - really powerful approach, because it does not require the "manual" validation of search results. Let's try to find out how good Walmart is (in this case, we’ll continue to use the query package for the fiskars brand). So, "fiskars steel chopping ax":

And let's automate all this
In principle, the result seems good, but after looking closely, you can easily notice completely irrelevant options. To perform the necessary checks without human intervention is easy enough, look at the report:

The result is negative because there are too many really irrelevant products in the final set (it doesn’t need a cleaver and take the machete too). Naturally, the evaluation function is easy to replace with a less “aggressive” one. In future versions we hope to implement more flexible scoring of products and expand the basic functionality.
Verdict
In the current version, Walmart was not impressed, its Fiskars rating is not very high:

Do not think that everything is very bad. In many, even the most trivial, cases, the results are correct, but in order to further improve the search, it is necessary to highlight all potentially problematic moments. Using the proposed tool is always possible:
- go to another scoring
- remove objectionable requests
- correct search :)
We will give you the opportunity to assess the complexity of the requests:

Links
- Intent-based Search Quality
- Search Quality API
- “I intend to buy” or the easiest way to assess the quality of grocery search
- Petrol bikes or weird product search (e-commerce)