On many resources for the sale of something, users are provided with additional information and the ability to compare similar products.
This can be a comparison of the characteristics of monitors in an online store, or a display of similar real estate objects in a given city and region.
In any case, when someone decides to sell / buy something - he always has a question -
for what price? There is an assumption that resources are able to competently
prompt sellers / buyers of the price of goods , may receive additional interest from users.
More or less definitely you can answer this question
(what is the price of the goods?) - if the goods offered for sale are new - in this case, you can analyze the prices of the same goods in stores, car dealerships, Internet sites. Somewhat more complicated is the pricing for used (used) goods. The difficulty lies in the fact that in the process of using the product, it acquires unique characteristics. One drives a car for many years every day - but neatly, the other invests in tuning, the third is engaged in artistic modeling on the ceiling of his apartment, the fourth likes, sometimes, dropping the monitor on the floor ... And all of them at some point want to sell their car, flat, monitor and more. The second such machine may not exist in the world. There were a lot of identical cars - when they left the assembly line - but after they were driven by different people in different conditions - and the cars became different. In varying degrees, this applies not only cars but any other product. Therefore, the
average cost - is a kind of pivot point from which you can push off the seller and buyer. Below I will give an example of how this problem - the
determination of the approximate cost of the goods - we are trying to solve on the website for the sale of cars
am.ua.Initially, we limited the number of parameters used for the analysis of average prices. They took as a basis: make
, model, type of gearbox and year of manufacture of the car . The limitation of parameters for analysis β caused by a limited set of input data β is about 50,000 actual ads per site.
Average prices are not calculated for all ads (for some brands-models there is not enough data for statistical analysis). We have provided the graph itself with a certain amount of interactivity - points on the chart are links to ads, or to pages with ads for a certain year of release.
To study the reaction of users - added the "leave a wish" button. Overall, the reviews were positive. However, sometimes there were comments like: the
average price is like the average temperature in a hospital . This remark is not devoid of common sense, as well as the requirement to take into account in the analysis of specific equipment / modification, engine size and other parameters. On the other hand, if you increase the number of parameters taken into account in the analysis, the number of graphs for ads decreases dramatically - this is all caused by the same limited data set. Here we have to balance between the desire of users to see a more accurate figure of the average cost - and the desire of users to see this figure next to each car. We stopped at the display option of a rather rough average price, which serves as a starting point for bargaining a buyer with a seller (
my car is more expensive because it additionally has this and that )
')
For completeness, I will briefly describe the algorithm for calculating average prices:
- actual advertisements are selected for a certain brand / model / gearbox - the data is displayed by year (discarded not cleared and after an accident)
- takes the minimum and maximum year in which there are at least 10 ads
- for each year, from the resulting range, the arithmetic average price is calculated (the maximum and minimum values ββare discarded)
- the average price for a particular year is calculated using the formula middlePrice = (y0 * ((kx) / k) * $ zexp (b * x)) + (yn * (x / k) * $ zexp (b * (kx)) ) where:
$ zexp is e to the extent
y0 - arithmetic average prices in the year of the beginning of the range
yn is the arithmetic average of prices in the year of the end of the range
k is the range of years
x - the difference between a particular year and the year of the beginning of the range
b - matched coefficient within [-0.1: 0]
- there is a selection b at which the average percentage deviation of received prices from arithmetic average prices is minimal
- if the minimum average deviation does not exceed 10% - the data for the graph are recorded
The exponential function was specifically assigned along the edges of the
[y0: yn] range - since it tends to grow rapidly even with small
b .
The described implementation of mapping price analytics is the first trial version, which does not claim perfection and completeness. If someone faced with similar tasks - and can share my experience I will be very grateful. Tips for improving math will also be helpful. Particularly interesting is how you solve the problem of a balance between the quality and quantity of analytical information supplied - in systems where accuracy is not critical (there is no
absolutely accurate average price for a product ).