install.packages(“Name_Of_R_Package”)
.dplyr, ggplot2, reshape2
. Of course, this is not a complete list. In this article we will focus more on the packages used in machine learning. dataset <- data.frame(var1=rnorm(20,0,1), var2=rnorm(20,5,1)) dataset[c(2,5,7,10),1] <- NA dataset[c(4,8,19),2] <- NA summary(dataset)
install.pckages(“mice”) require(mice) dataset2 <- mice(dataset) dataset2<-complete(dataset2) summary(dataset2)
rpart
: let's divide the datarpart
package in the R language is used to construct classification and regression models using a two-step procedure, and the result is represented in the form of binary trees. The easiest way to build a regression or classification tree using rpart
is to call the function plot()
. By itself, the function plot()
may not give a fairly beautiful result, so there is an alternative - prp()
- a powerful and flexible function. prp()
in the rpart.plot
package rpart.plot
often called the real Swiss knife for building regression trees.rpart()
function allows you to establish a relationship between the dependent and independent variables to show the variance of the dependent variable based on the independent ones. For example, if an online training company wants to know how sales (dependent variable) are affected by promotion in social networks, newspapers, referral links, word of mouth, etc., there are several functions in rpart
that can help with the analysis of this phenomenon. rpart(formula, data=, method=,control=)
Here, the formula contains a combination of dependent and independent variables; data is the name of the data array, method depends on the target, i.e. for a classification tree, this will be a class; control depends on your requirements, for example, you need a variable with a minimum value to separate the vertices.iris
dataset, which looks like this: rpart_tree <- rpart(formula = Species~., data=iris, method = 'class') summary(rpart_tree) plot(rpart_tree)
Here is what a built tree looks like:predict(tree_name,new_data)
, which will predict(tree_name,new_data)
predictable classes as a result.PARTY
: let's divide the data againPARTY
package in R is used for recursive separation and displays continuous improvement of ensemble methods. PARTY
is another package for building decision trees based on the conditional inference algorithm. ctree()
is the main function of the PARTY
package, it is widely used and reduces training time and possible deviations.PARTY
has a syntax similar to other predictive analytics functions in R, i.e. ctree(formula,data)
The function will build a decision tree, taking the default values for numerous arguments, you can change them if necessary. party_tree <- ctree(formula=Species~. , data = iris) plot(party_tree)
CARET
: Classification And REgression Training (classification and regression training)CARET
- Classification And REgression Training (classification and regression training) is designed to combine training and forecasting models. There are several algorithms in the package that are suitable for different tasks. A data analyst cannot always say exactly which algorithm is best for solving a particular task. The CARET
package allows CARET
to choose the optimal parameters for the algorithm using controlled experiments. The cross-search method implemented in this package searches for parameters by combining various methods for evaluating model performance. After going through all the possible combinations, the cross-search method finds the combination that gives the best results.CARET
package is one of the best in R. The developers of this package understood how difficult it is to choose the most suitable algorithm for each task. There are cases when a particular model is used, and there are doubts about the quality of the data, but still the problem most often turns out to be in the chosen algorithm.CARET
package, CARET
can execute names(getModelInfo())
and see a list of 217 available methods.CARET
uses the train()
function. Its syntax is: train(formula, data, method)
Here method is the prediction model you are trying to build. Let's use the iris data array and the linear regression model to predict Sepal.Length. Lm_model <- train(Sepal.Length~Sepal.Width + Petal.Length + Petal.Width, data=iris, method = “lm”) summary(lm_model)
CARET
package not only builds models, but also breaks the data into test and training, makes the necessary transformations, etc.Source: https://habr.com/ru/post/305692/
All Articles