⬆️ ⬇️

OOP in R (part 1): S3 classes

R is an object oriented language. Everything in it is an object, starting from functions and ending with tables.



In turn, each object in R belongs to a class. In fact, in the world around us, the situation is about the same. We are surrounded by objects, and each object can be assigned to a class. The set of properties and actions that can be performed with this object depends on the class.



image



For example, in every kitchen there is a table and stove. And the kitchen table and stove can be called kitchen equipment. Table properties are usually limited to its size, color and material from which it is made. At the stove, the set of properties is wider, at least the power, the number of burners and the type of stove (electric or gas) will be required.



Actions that can be performed on objects are called their methods. For the table and plate, respectively, the set of methods will also be different. You can dine at the table, you can cook on it, but it is impossible to produce heat treatment of food, for which the cooker is usually used.

image



Content





Class properties



In the R language, each object also belongs to a class. Depending on the class, it has a certain set of properties and methods. In terms of object-oriented programming (OOP), the ability to combine objects that are similar in a set of properties and methods into groups (classes) is called encapsulation .



The vector is the simplest class of objects in R, it has such a property as length (length). For example, we take the built-in vector letters .



length(letters) 


 [1] 26 


Using the length function, we get the length of the vector letters . Now we will try to apply the same function to the iris frame embedded in the date.



 length(iris) 


 [1] 5 


The length function, applicable to tables, returns the number of columns.



Tables have another property, dimension.



 dim(iris) 


 [1] 150 5 


The dim function in the example above displays information that the iris table has 150 rows and 5 columns.



In turn, the vector has no dimension.



 dim(letters) 


 NULL 


Thus, we made sure that objects of different classes have a different set of properties.



Generic functions



R has many generalized functions: print , plot , summary , etc. These functions work differently with objects of different classes.



Take, for example, the function plot . Let's run it by passing the iris table as its main argument.



plot(iris)



Result:



The result of the function plot



And now we will try to transfer to the plot function a vector of 100 random numbers that have a normal distribution.



plot(rnorm(100, 50, 30))



Result:



The result of the function plot



We obtained different graphs, in the first case, a correlation matrix, in the second a scatterplot, on which the observation index is displayed along the x axis, and its value along the y axis.



Thus, the plot function is able to adapt to working with different classes. If you go back to the terminology of OOP, then the ability to determine the class of the incoming object and perform various actions with objects of different classes is called polymorphism . It turns out this due to the fact that this function is only a wrapper to the set of methods written for working with different classes. You can verify this with the following command:



 body(plot) 


 UseMethod("plot") 


The body command displays the function body in the R console. As you can see, the body of the body function consists of just one UseMethod("plot") command.



Those. the function plot , just runs one of the many methods written to it, depending on the class of the object passed to it. You can see the list of all its methods as follows.



 methods(plot) 


  [1] plot.acf* plot.data.frame* plot.decomposed.ts* [4] plot.default plot.dendrogram* plot.density* [7] plot.ecdf plot.factor* plot.formula* [10] plot.function plot.hclust* plot.histogram* [13] plot.HoltWinters* plot.isoreg* plot.lm* [16] plot.medpolish* plot.mlm* plot.ppr* [19] plot.prcomp* plot.princomp* plot.profile.nls* [22] plot.raster* plot.spec* plot.stepfun [25] plot.stl* plot.table* plot.ts [28] plot.tskernel* plot.TukeyHSD* 


The result suggests that the plot function has 29 methods, among which is plot.default , which works by default if the function receives an object of unknown class as input.



Using the methods function, you can also get a set of all generalized functions that have a method written for a class.



 methods(, "data.frame") 


  [1] $<- [ [[ [[<- [5] [<- aggregate anyDuplicated as.data.frame [9] as.list as.matrix by cbind [13] coerce dim dimnames dimnames<- [17] droplevels duplicated edit format [21] formula head initialize is.na [25] Math merge na.exclude na.omit [29] Ops plot print prompt [33] rbind row.names row.names<- rowsum [37] show slotsFromS3 split split<- [41] stack str subset summary [45] Summary t tail transform [49] type.convert unique unstack within 


What is S3 class and how to create your own class



R has a number of classes that you can create yourself. One of the most popular is the S3.



This class is a list in which various properties of the class you created are stored. To create your own class, just create a list and assign it a class name.



In the book "The Art of Programming in R" , the class of employee is given as an example, in which information about the employee is stored. As an example to this article, I also decided to take an object to store information about employees. But made it more complex and functional.



 #    employee1 <- list(name = "Oleg", surname = "Petrov", salary = 1500, salary_datetime = Sys.Date(), previous_sallary = NULL, update = Sys.time()) #    class(employee1) <- "emp" 


Thus, we have created our own class, which stores the following data in its structure:





Then with the class(employee1) <- "emp" command class(employee1) <- "emp" we assign the object class to the object.



For the convenience of creating objects of class emp, you can write a function.



Function code for creating emp objects
 #     create_employee <- function(name, surname, salary, salary_datetime = Sys.Date(), update = Sys.time()) { out <- list(name = name, surname = surname, salary = salary, salary_datetime = salary_datetime, previous_sallary = NULL, update = update) class(out) <- "emp" return(out) } #    emp    create_employee employee1 <- create_employee("Oleg", "Petrov", 1500) #     class(employee1) 


 [1] "emp" 


Value assignment functions for custom S3 classes



So, we have created our own emp class, but so far this has not given us anything. Let's see why we created our class and what can be done with it.



First of all you can write assignment functions for the created class.



Assignment function for [
 "[<-.emp" <- function(x, i, value) { if ( i == "salary" || i == 3 ) { cat(x$name, x$surname, "has changed salary from", x$salary, "to", value) x$previous_sallary <- x$salary x$salary <- value x$salary_datetime <- Sys.Date() x$update <- Sys.time() } else { cat( "You can`t change anything except salary" ) } return(x) } 


Assignment function for [[
 "[[<-.emp" <- function(x, i, value) { if ( i == "salary" || i == 3 ) { cat(x$name, x$surname, "has changed salary from", x$salary, "to", value) x$previous_sallary <- x$salary x$salary <- value x$salary_datetime <- Sys.Date() x$update <- Sys.time() } else { cat( "You can`t change anything except salary" ) } return(x) } 


Assignment functions when creating are always specified in quotes, and look like this: "[<-. " / "[[<-. " . And have 3 required arguments.





Further in the body of the function you write how the elements of your class should change. In my case, I want the user to be able to change only the salary (the salary element, whose index is 3) . Therefore, inside the function, I write a check if ( i == "salary" || i == 3 ) . In case the user tries to edit other properties, he receives the message "You can't change anything except salary" .



When the salary element changes, a message is displayed containing the name and surname of the employee, his current and new salary level. The current salary is transferred to the previous_sallary property, and the salary is assigned a new value. The values ​​of the salary_datetime and update properties are also updated.



Now you can try to change the salary.



 employee1["salary"] <- 1750 


 Oleg Petrov has changed salary from 1500 to 1750 


Development of custom methods for generalized functions



Earlier you already learned that in R there are generalized functions that change their behavior depending on the class received as an input to an object.



You can add your methods to existing generalized functions and even create your own generalized functions.



One of the most commonly used generic functions is print . This function is triggered every time you call an object by its name. Now the print of the object of class emp created by us looks like this:



 $name [1] "Oleg" $surname [1] "Petrov" $salary [1] 1750 $salary_datetime [1] "2019-05-29" $previous_sallary [1] 1500 $update [1] "2019-05-29 11:13:25 EEST" 


Let's write our method for the print function.



 print.emp <- function(x) { cat("Name:", x$name, x$surname, "\n", "Current salary:", x$salary, "\n", "Days from last udpate:", Sys.Date() - x$salary_datetime, "\n", "Previous salary:", x$previous_sallary) } 


Now the print function is able to print the objects of our samopisny class emp . Simply enter the object name in the console and get the following output.



 employee1 


 Name: Oleg Petrov Current salary: 1750 Days from last udpate: 0 Previous salary: 1500 


Creating a generic function and methods



Most of the generalized functions inside look the same and just use the UseMethod function.



 #   get_salary <- function(x, ...) { UseMethod("get_salary") } 


Now we will write two methods for it, one for working with objects of the emp class, the second method will run by default for objects of all other classes for which our generic function does not have a separately written method.



 #      emp get_salary.emp <- function(x) x$salary #      get_salary.default <- function(x) cat("Work only with emp class objects") 


The name of the method consists of the name of the function and the class of objects that this method will process. The default method will run every time if you pass a class object to the function for which your method is not written.



 get_salary(employee1) 


 [1] 1750 


 get_salary(iris) 


 Work only with emp class objects 


Inheritance



Another term that you will definitely encounter when learning about object-oriented programming.



image



All that is shown in the picture can be classified as transport . And indeed, all of these objects have a common method - movement, and common properties, for example, speed. Nevertheless, all 6 objects can be divided into three subclasses: ground, water and air. In this case, the subclass will inherit the properties of the parent class, but will also have additional properties and methods. A similar property in object-oriented programming is called inheritance .



In our example, we can separate remote_emp employees in a separate subclass of remote_emp . Such employees will have an additional property: city of residence.



 #    employee2 <- list(name = "Ivan", surname = "Ivanov", salary = 500, salary_datetime = Sys.Date(), previous_sallary = NULL, update = Sys.time(), city = "Moscow") #    remote_emp class(employee2) <- c("remote_emp", "emp") #    class(employee2) 


 [1] "remote_emp" "emp" 


During the operation of assigning a class, creating a subclass we use a vector in which the first element is the name of the subclass, followed by the name of the parent class.



In the case of inheritance, all generic functions and methods written to work with the parent class will work correctly with its subclasses.



 #    remote_emp   employee2 


 Name: Ivan Ivanov Current salary: 500 Days from last udpate: 0 Previous salary: 


 #   salary   remote_emp get_salary(employee2) 


 [1] 500 


But you can develop methods separately for each subclass.



 #     salary   remote_emp get_salary.remote_emp <- function(x) { cat(x$surname, "remote from", x$city, "\n") return(x$salary) } 


 #   salary   remote_emp get_salary(employee2) 


 Ivanov remote from Moscow [1] 500 


It works as follows. First, a generic function searches for a method written for a remote_emp subclass, if it does not find it, it goes further and searches for a method written for the parent class emp .



When you can use your own classes



It is unlikely that the functionality of creating your own S3 classes will be useful to those who are just starting their way in mastering the language R.



I personally found them useful in developing the rfacebookstat package. The fact is that in the Facebook API, the action_breakdowns parameter exists in the Facebook API for downloading events and reacting to advertising publications.



When using such groupings, you get the answer in the form of a JSON structure of the following format:



 { "action_name": "like", "action_type": "post_reaction", "value": 6 } { "action_type": "comment", "value": 4 } 


The number and name of the elements for different action_breakdowns is different, so for everyone you need to write your parser. To solve this problem, I used the functionality of creating custom S3 classes and a generalized function with a set of methods.



When requesting statistics on events with groupings, depending on the values ​​of the arguments, a class was determined which was assigned to the response received from the API. The answer was transferred to the generalized function, and depending on the class specified earlier, the method that carried out the parsing of the result was determined. Who is interested in delving into the implementation details, here you can find the code for creating a generic function and methods, and then use them.



In my case, the classes and methods for their processing I used exclusively inside the package. If you generally need to provide the user of the package with an interface for working with the classes you created, then all the methods should be included as a S3method directive in the S3method file, as follows.



 S3method(_,) S3method("[<-",emp) S3method("[[<-",emp) S3method("print",emp) 


Conclusion



As is clear from the title of the article, this is only the first part, since in R, in addition to S3 classes, there are others: S4 , R5 ( RC ), R6 . In the future, I will try to write about each of the OOP implementations listed. Nevertheless, someone who has an English level allows you to read books freely, then Headley Wickham is rather laconic enough, and with examples he highlighted this topic in his book "Advanced R" .



If suddenly in the article I missed some important information about the S3 classes I would be grateful if you write about it in the comments.



')

Source: https://habr.com/ru/post/453964/



All Articles