πŸ“œ ⬆️ ⬇️

As a programmer picked up a new machine

In previous articles ( I , II , III ) I talked in detail about the development of a service for finding profitable used cars in the Russian Federation.

Having traveled for a long time on various used machines, I thought about purchasing a new car and decided to study this issue in detail. In large cities, there are a huge number of official dealers, at least for popular brands. Dealers differ from each other in the list of cars in stock and the size of the discounts offered on various models. In search of cars of interest to me, I did not want to ring up and visit all dealers in a row. In my opinion, it was reasonable to pre-select for a priori information only those dealers who provide the lowest prices for models and equipment that interest me. The fact that during personal communication, if you are able to bargain, the size of the discount can significantly increase does not contradict the goal in the first place to visit dealers who provide the most favorable prices on the market.

I collected data about new cars, analyzed it, designed it as a service, and at the end of the year, when dealers had discounts at the maximum, I decided to share it with you.
')

Competitor Overview


In RuNet there is already a service for the selection and purchase of new cars - autospot.ru, but it has the following significant drawbacks:

  1. The site is not possible to find out the contacts of dealers where you are interested in cars, you can only leave your phone for feedback. The manager of autospot.ru will contact you within half an hour, clarify your brand, model and equipment, after which it will inform your phone to dealers who have cars that meet your requirements. You will have to wait for the initial call from the dealer for at least an hour. Extremely annoying factor in this communication is intrusiveness. After the conversation, the autospot.ru manager and various dealers will call you quite often, reminding you of yourself, even if you ask them otherwise.
  2. The site contains ~ 30,000 new cars from all over Russia, which is less than 30% of the average number of new cars sold per month in 2018 .
  3. On the site there is no assessment of the cost of cars relative to the market in accordance with the configuration and add. options, that is, it is impossible to understand what offers the most profitable.

For auto.ru, drom.ru and avito.ru message boards, lack # 2 is relevant to varying degrees, and # 3 is fully relevant for everyone.

Thus, the circle of principal tasks that I had to solve during the development of the service was outlined.

Data collection


Data on new cars from official dealers is collected from various sources, processed, systematized and unified. Updating data and adding new sources is done on a regular basis. The data volume is ~ 75,000 new cars from more than 650 dealers from 70+ cities in Russia.

The process of collecting and processing data is beyond the scope of this article and may be covered in one of the following articles.

Data Transformation and Model Building


To search for profitable cars for each model of car, a predictive regression model was formed separately, the target variable of which was the price of the car, and the predictors were formed from the baseline data of the car, the configuration and the set extras. options. Parameters that have multiple non-numeric values ​​are represented in the model as an n-1 dummy variable.

For clarity, the set of regressors for the Volkswagen Tiguan is:

Read more
[1] "price"
[2] "availability"
[3] "year"
[4] "volume"
[5] "power"
[6] "front_drive"
[7] "rear_drive"
[8] "mkpp"
[9] "benzin"
[10] "dizel"
[11] "body_tsvet_kuzova_chernyy_metallik"
[12] "interior_tsvet_peredney_paneli_temnyy"
[13] "interior_tsvet_obivki_sideniy_temnyy"
[14] "interior_tsvet_potolka_temnyy"
[15] "interior_tsvet_kovrovogo_pokrytiya_temnyy"
[16] "equip_dnevnoy_svet"
[17] "equip_avtokorrektor_far_s_dinamicheskim_povorotnym_svetom"
[18] "equip_paket_innovation"
[19] "equip_fary_svetodiodnye"
[20] "equip_omyvatel_far"
[21] "equip_bortovoy_kompyuter"
[22] "safety_avtomaticheskaya_regulirovka_dalnosti_sveta"
[23] "body_tsvet_kuzova_siniy_metallik"
[24] "body_tsvet_kuzova_serebristyy_metallik"
[25] "equip_sistema_avtomaticheskoy_parkovki"
[26] "equip_tekhnicheskiy_kod"
[27] "equip_paket_media"
[28] "equip_usb_interfeysvklyuchaya_auxin"
[29] "equip_interfeys_appconnect"
[30] "equip_paket_zimnie_tekhnologii"
[31] "equip_multimediynaya_sistema_audio"
[32] "equip_parktronik"
[33] safety_videokamera
[34] "main_komplektatsiya_city_20_tdi_150hp_7dsg_4motion"
[35] "body_shiny_21565_r17_99_v"
[36] "body_razmer_diskov_r17"
[37] "body_diski_legkosplavnye"
[38] "interior_tip_sideniy_sportivnye"
[39] "interior_obivka_sideniy_kozha"
[40] "equip_distantsionnoe_otkryvanie_bagazhnika"
[41] "equip_zapusk_bez_povorota_klyucha"
[42] "equip_dostup_bez_klyucha"
[43] "equip_interfeys_dlya_smartfonov_appconnect"
[44] "equip_kruizkontrol"
[45] "equip_pamyat_nastroek"
[46] "equip_dopolnitelnyy_otopitel"
[47] "safety_sistema_kontrolya_mertvykh_zon"
[48] ​​"equip_sidenya_ergoactive_dlya_voditelya_s_14pozitsionnoy_regulirovkoy"
[49] "equip_elektroprivod_zerkal"
[50] "equip_paket_tekhnika"
[51] "safety_datchiki_davleniya_v_shinakh"
[52] "safety_okhrannaya_signalizatsiya"
[53] body_tsvet_kuzova_belyy
[54] "equip_spetsialnaya_seriya_city"
[55] "main_komplektatsiya_city_14_tsi_150hp_6dsg_4motion"
[56] "equip_panoramnaya_krysha"
[57] "body_bamper_s_uvelichennym_uglom_vezda_24_gradusa"
[58] body_paket_offroad
[59] "body_nakladki_na_dvernye_porogi"
[60] "interior_nakladki_na_dvernye_porogi"
[61] "main_komplektatsiya_city_20_tsi_180hp_7dsg_4motion"
[62] "body_shiny_23550_r19_99v"
[63] "body_razmer_diskov_r19"
[64] "interior_dvernye_paneli_skozhanoy_otdelkoy"
[65] "equip_elektroprivod_sideniy"
[66] "main_komplektatsiya_city_14_tsi_150hp_6dsg"
[67] "equip_vybor_rezhimov_vozhdeniya"
[68] "interior_yashchik_dlya_khraneniya_pod_perednim_passazhirskim_kreslom"
[69] "interior_ergonomichnye_perednie_sidenya"
[70] "equip_voditelskoe_sidene_s_regulirovkoy_po_vysote_dline_uglu_naklona_spinki"
[71] equip_massazhnye_sideniya
[72] "body_tsvet_kuzova_bezhevyy_metallik"
[73] "body_tsvet_kuzova_krasnyy_metallik"
[74] "body_shiny_23555_r18_100v"
[75] "body_razmer_diskov_r18"
[76] "interior_nakladki_na_porogi_s_podsvetkoy"
[77] "interior_dekorativnye_vstavki_dark_grid"
[78] "interior_dve_lampy_dlya_chteniya_speredi"
[79] "equip_paket_osveshchenie"
[80] "equip_fonovaya_podsvetka_interera"
[81] "equip_svetodiodnye_zadnie_fonari_3d"
[82] "main_komplektatsiya_offroad_20_tsi_180hp_7dsg_4motion"
[83] "body_korpusa_naruzhnykh_zerkal_okrashennye_v_chernyy_tsvet"
[84] "body_polnorazmernoe_stalnoe_zapasnoe_koleso_65x17"
[85] "body_peredniy_bamper_s_uvelichennym_uglom_vezda_26_gradusov_zadniy_bamper_s_dekorativnymi_vstavkami_dekorativnye_nakladki_na_dveri"
[86] "body_spoyler_na_zadney_dveri"
[87] "interior_dekorativnye_vstavki_dlya_spetsialnoy_versii"
[88] "interior_peredniy_podlokotnik_s_dvumya_podstakannikami_i_shtorkoy"
[89] "interior_ploskiy_pol_bagazhnogo_otdeleniy"
[90] "interior_yashchiki_dlya_khraneniya_pod_perednimi_kreslami"
[91] "interior_skladnye_stoliki_v_spinkakh_perednikh_kresel"
[92] "interior_alyuminievye_nakladki_na_pedali"
[93] "interior_rezinovye_salonnye_kovriki_speredi_i_szadi_s_logotipom_offroad"
[94] "interior_nakladki_na_dvernye_pori_offroad"
[95] "interior_stekla_atermalnye_tonirovannye"
[96] "equip_datchik_sveta"
[97] "equip_spetsialnaya_versiya_offroad"
[98] "equip_klavishi_mekhanicheskoy_razblokirovki_spinok_zadnikh_sideniy_v_bagazhnom_otseke"
[99] "equip_vnutrennee_zerkalo_zadnego_vida_s_avtozatemneniem"
[100] "equip_poyasnichnyy_podpor_dlya_perednikh_sideniy"
[101] "equip_polnostyu_skladnaya_spinka_perednego_passazhirskogo_kresla"
[102] "equip_avtokorrektor_far"
[103] "equip_funktsiya_coming_homeleaving_home"
[104] "equip_2_usb_razema_v_peredney_konsoli_1_usb_razem_v_tsentralnoy_konsoli_dlya_zaryadki"
[105] "equip_datchik_dozhdya"
[106] "equip_obogrev_lobovogo_stekla"
[107] "safety_hdc_sistema_pomoshchi_pri_spuske_so_sklona"
[108] "body_tsvet_kuzova_korichnevyy_metallik"
[109] "equip_navigatsionnaya_sistema"
[110] "equip_paket_navigatsiya"
[111] "equip_golosovoe_upravlenie"
[112] "equip_usb_interfeys_ipodiphonevklyuchaya_auxin"
[113] "equip_multimediynaya_sistema_audiovideo"
[114] "main_komplektatsiya_sportline_20_tsi_220hp_7dsg_4motion"
[115] "body_paket_vneshnikh_elementov_sportline"
[116] "body_shiny_airstop_25545_r_19"
[117] "body_bampery_v_sportivnom_stile_i_nakladki_na_porogi_v_tsvet_kuzova_rasshiriteli_kolesnykh_arok"
[118] "equip_individualnaya_sborka"
[119] "equip_rulevoe_upravlenie_s_peremennym_peredatochnym_otnosheniem"
[120] "equip_multifunktsionalnyy_rul"
[121] "interior_dvernye_paneli_s_kozhanoy_otdelkoy"
[122] "main_komplektatsiya_offroad_14_tsi_150hp_6dsg_4motion"
[123] "equip_zerkalo_zadnego_vida_s_avtozatemneniem"
[124] "equip_tsentralnyy_zamok"
[125] "main_komplektatsiya_sportline_20_tsi_180hp_7dsg_4motion"
[126] "safety_podushki_bezopasnosti_sht_11"
[127] safety_paket_bezopasnost
[128] "safety_proaktivnaya_sistema_zashchity_passazhirov_presafe"
[129] "interior_otdelka_dverey"
[130] "main_komplektatsiya_sportline_20_tdi_150hp_7dsg_4motion"
[131] "main_komplektatsiya_offroad_14_tsi_150hp_6mt_4motion"
[132] "main_komplektatsiya_offroad_20_tdi_150hp_7dsg_4motion"
[133] "interior_salonnye_kovriki_speredi_i_szadi"
[134] "interior_tsvet_obivki_sideniy_kombinirovannyy"
[135] "equip_paket_discover_pro"
[136] "interior_paket_khranenie"
[137] "interior_makiyazhnye_zerkala_s_podsvetkoy_v_solntsezashchitnykh_kozyrkakh"
[138] "interior_bagazhnaya_setka"
[139] "interior_potolochnaya_konsol_s_otsekami_dlya_khraneniya"
[140] "body_tsvet_kuzova_belyy_metallik"
[141] "body_pritsepnoe_ustroystvo"
[142] equip_obogrev_zerkal
[143] "interior_obivka_sideniy_velyur"
[144] "body_tsvet_kuzova_seryy"
[145] "body_standartnyy_bamper_s_khromirovannoy_otdelkoy"
[146] "interior_khromirovannaya_otdelka_elementov_interera"
[147] "equip_paket_style"
[148] "equip_paket_premium"
[149] "equip_generator_180a"

To build regression models in the first release, I used the well-proven Random Forest algorithm for used cars ( II ).

#    library(reshape2) library(caret) library(randomForest) new_cars_data <- read.csv('new_cars_data_tiguan.txt') #    R new_cars_data_cor <- as.matrix(cor(new_cars_data)) #     new_cars_data_cor [lower.tri(CM, diag = TRUE)] <- NA #  NA      high_cor_vars <- subset(melt(new_cars_data_cor , na.rm = TRUE), value == 1.0) #       #          if(length(high_cor_vars[,2])) { dataset <- new_cars_data[(-c(high_cor_vars[,2]))] } else { dataset <- new_cars_data } set.seed(1) #     ( ) split <- runif(dim(dataset)[1]) > 0.2 #    train <- dataset[split,] #      (cross-validation)  test <- dataset[!split,] #  (hold-out)  

For cross-checking, I used the caret package, which has a large number of possibilities for assessing the quality of the model.

 fit.control <- trainControl(method = "repeatedcv", number = 10, repeats = 10) train.rf.model <- train(price~., data=train, method="rf", trControl=fit.control , metric = "RMSE") #  10-  10-  -    train.rf.model #    - 

Read more
Random forest

1858 samples
111 predictor

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 1673, 1672, 1672, 1672, 1671, 1673, ...
Resampling results across tuning parameters:

mtry RMSE Rsquared
2 132963.50 0.7264413
56 79757.67 0.8626671
111 80401.10 0.8605166

RMSE was used for the smallest value.
The final value used for the model was mtry = 56.

The resulting value of the coefficient of determination ( Rsquared ) means that the dependent variable (price) is very well explained by the model in question.

 train.rf.model <- randomForest(price ~ ., train,mtry=56) #        -  varImpPlot(train.rf.model) #   30    



 rf.model.predictions <- predict(train.rf.model, test) #       print(sqrt(sum((as.vector(rf.model.predictions - test$price))^2)/length(rf.model.predictions))) #     ( ) [1] 82512.59 

Approbation of the algorithm


Let's check what real benefit can be expected due to the developed algorithm.

 rf.model <- randomForest(price ~ ., dataset,mtry=56) predicted.price <- predict(rf.model, dataset) real.price <- dataset$price profit <- predicted.price - real.price 

We construct a graph of the benefits of the price.

 plot(real.price,profit) abline(0,0) 



Calculate the benefit as a percentage.

 sorted <- sort(predicted.price /real.price, decreasing = TRUE) sorted[1:8] 195 193 6 207 202 203 906 206 1.184079 1.176262 1.132920 1.126626 1.123967 1.123967 1.116736 1.116344 

Given the fact that the profit is calculated according to a priori information from the dealer, and in person you can still bargain, the maximum profit obtained at 18% is a very good result.

Web service implementation


After you have dealt with the technical part, it's time to start searching for the car of interest.

For example, I look at the Volkswagen Tiguan in the City 2.0 TSI 180hp 7DSG 4Motio configuration .









Using the service, you will know which dealers it is reasonable to call first of all, to clarify the availability of the car, the price, the conditions and visit for inspection, and potential purchase.

At last


Thus, I realized an assistant to select a new car from an authorized dealer at the best price on the market, taking into account all the installed extras. options.
I draw your attention to the fact that a low price for a car can be provided by a dealer under certain conditions (credit, KASKO, trade in, etc.), which generally introduces an error in the profit estimate. In my opinion, for the sake of a good discount, it is rational to take advantage of some of the similar services imposed by the dealer, whether it is a CASCO at a partner or, for example, a loan that you repay as soon as possible. This is a purely individual question, but in any case, it is desirable, in order to avoid misunderstanding, to clarify the conditions by phone before visiting the salon.

The service release time was not chosen by chance, since the end of the year is the best time to purchase a new car when dealers are ready to provide the most significant discounts.
At the moment the service is in beta testing and is absolutely free.

As statistical data is accumulated, a new analytics and infographics will appear on the service, which will be interesting to both end users and authorized dealers, but I will tell you about this in the next article.

Links


Sample for Volkswagen Tiguan

Source: https://habr.com/ru/post/430336/


All Articles